PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Apache License 2.0
38.99k stars 7.32k forks source link

finetune SER model 任務中的 max_seq_len #12002

Closed bhhsieh closed 21 hours ago

bhhsieh commented 3 weeks ago

請問 finetune SER model 任務中的 max_seq_len 預設是 512 https://github.com/PaddlePaddle/PaddleOCR/blob/main/configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml

我們數據集的 token 長度會超過 512,想調整這個值為 1024 但調整後遇到以下 error image 想請教如何解決,謝謝

cuicheng01 commented 3 weeks ago

代码里涉及到expand算子的地方需要跟着改动下

bhhsieh commented 3 weeks ago

請問具體要修改什麼地方? image 我最後找到這個function,但沒找到 error 顯示 value (514) 之處, 再麻煩說明一下,感謝!

我的版本為 paddlepaddle-gpu==2.3.1 paddlenlp==2.5.2

tran601 commented 1 week ago

image image image

我是分批送进去

cuicheng01 commented 1 week ago

paddle的版本建议升级下呢?