Ucas-HaoranWei / GOT-OCR2.0

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
5.19k stars 420 forks source link

新语言训练效果 #117

Open leo-ztjht opened 2 hours ago

leo-ztjht commented 2 hours ago

{ "image": "00000001.png", "question": "...\nOCR: ", "label": "ﺏﺎﻟﺮﻐﻣ ﻢﻧ ﺰﻳﺍﺮﺘﻳ ﻞﻫﺫﺍ ﺎﻟﻭﺍﺪﻳ ﻉﺩ ﻡﺭﺎﺗ ﻻ ﻥ ﺮﺤﻠﺘﻳ ﺎﻠﺨﻳﺭ ﻞﻴﻫ ﻞﻫﺍ ﻂﻌﻣ ﻢﺨﺘﻠﻓ ﻞﻗﺩ ﻮﺼﻠﺗ ﻝ ﻭﺍﺪﻳ ﺎﻠﻌﻴﻧ ﻊﺑﺭ ﺎﻠﻃﺮﻴﻗ ﺎﻠﺗﺭﺎﺒﻳ ﺎﻠﺟﺪﻳﺩ ﺎﻟﺬﻳ ﻱﺮﺒﻃ ﻭﺍﺪﻳ ﺎﻠﻌﻴﻧ ﺏﻭﻼﻳ ﻊﺑﺮﻳ ﺏﻭﺍﺪﻳ ﻍﻮﻟ ﺏﻭﻼﻳ ﺎﻠﺤﻣﺭﺍ ﻡﺭﻭﺭﺍ ﺏﺎﻠﻌﻘﺑ ﺎﻠﺳﻭﺩﺍ ﻮﻤﻨﻫﺍ ﻲﻨﺣﺩﺭ ﺎﻠﺷﺍﺮﻋ ﻝ ﻕﺮﻳ ﺲﻨﺗ ﻮﻗﺮﻳ ﺺﻨﺗ ﺎﻠﺠﺒﻠﻳ ﺎﻠﻠﺘﻴﻧ ﺖﺘﺒﻋﺎﻧ ﻭﻼﻳ ﺐﻫﻻ ﻮﻴﺤﺘﻀﻨﻬﻣﺍ ﺞﺒﻟ ﻙﻭﺭ ﻊﻟ ﺲﻓﻮﺤﻫ ﺎﻠﺷﺮﻘﻳ ﻲﺒﻠﻏ ﻁﻮﻟ ﻩﺫﺍ ﺎﻠﺷﺍﺮﻋ ﻢﺳﺎﻓ ﻚﻣ ﺎﻟﺮﺤﻟ ﻝ ﻭﺍﺪﻳ ﺎﻠﻌﻴﻧ ﺏﻭﻼﻳ ﻊﺑﺮﻳ ﻊﺑﺭ ﻩﺫﺍ ﺎﻠﺷﺍﺮﻋ ﺖﺘﻴﺣ ﺎﻠﻓﺮﺻ ﻞﻤﺷﺎﻫﺩ ﺞﺒﻟ ﻙﻭﺭ ﻢﻧ ﺎﻠﺠﻫ ﺎﻠﺷﺮﻘﻳ ﻭﺍﻼﺴﺘﻤﺗﺎﻋ ﺏﺍﻼﻨﺣﺩﺍﺭﺎﺗ ﺎﻠﺠﻤﻴﻟ ﻞﺠﺒﻟ ﻢﺸﻃ ﻩﺫﺍ ﺎﻠﺠﺒﻟ ﺎﻟﺬﻳ ﻲﺘﻐﻳﺭ ﺶﻜﻠﻫ ﻚﻠﻣﺍ ﺎﺨﺘﻠﻔﺗ ﺯﺍﻮﻳ ﻢﺷﺎﻫﺪﺘﻫ ﻮﻜﻟ ﺯﺍﻮﻳ ﻢﻨﻫ ﻲﻈﻫﺭ ﻒﻴﻫﺍ ﺐﻟﻮﺣ ﺞﻤﻴﻟ ﻢﺨﺘﻠﻓ ﻊﻧ ﺎﻠﺧﺭ ﻡﺭ ﻲﺷﺎﻫﺩ ﺐﻃﺭﺎﻔﻫ ﺎﻠﻤﺴﻨﻧ ﻚﻨﻫ ﺎﻠﻤﺸﻃ ﻝﺬﻠﻛ ﺲﻤﻳ ﺐﻫﺫﺍ ﺍﻼﺴﻣ ﻮﻣﺭ ﺥﺭ ﻲﺑﺩﻭ ﻚﻨﻫ ﻦﺑﺎﺗ ﺎﻠﻔﻃﺭ ﺥﺎﺻ ﻊﻧﺩ ﻢﺷﺎﻫﺪﺘﻫ ﻢﻧ ﺎﻠﺠﻫ ﺎﻠﺷﺮﻘﻳ ﻮﺑﺩﺍ ﻞﻳ ﻡﺭ ﻢﻧ ﻡﺮﺘﻔﻋﺎﺗ ﺞﺒﻟ ﺶﻤﺳ ﻚﻨﻫ ﻩﺮﻣ ﺾﺨﻣ ﻊﻟ ﺎﻨﺣﺩﺍﺭﺎﺗ ﺞﺒﻟ ﺶﻤﺳ ﺎﻠﻏﺮﺒﻳ", "answer": "ﺏﺎﻟﺮﻐﻣ ﻢﻧ ﺰﻳﺍﺮﺘﻳ ﻞﻫﺫﺍ ﺎﻟﻭﺍﺪﻳ ﻉﺩ ﻡﺭﺎﺗ ﻻ ﻥ ﺮﺤﻠﺘﻳ ﺎﻠﺨﻳﺭ ﻞﻴﻫ ﻞﻫﺍ ﻂﻌﻣ ﻢﺨﺘﻠﻓ ﻞﻗﺩ ﻮﺼﻠﺗ ﻝ ﻭﺍﺪﻳ ﺎﻠﻌﻴﻧ ﻊﺑﺭ ﺎﻠﻃﺮﻴﻗ ﺎﻟﺯﺭﺎﺒﻳ ﺎﻠﺟﺪﻳﺩ ﺎﻟﺬﻳ ﻱﺮﺒﻃ ﻭﺍﺪﻳ ﺎﻠﻌﻴﻧ ﺏﻭﻼﻳ ﻊﺑﺮﻳ ﺏﻭﺍﺪﻳ ﻍﻮﻟ ﺏﻭﻼﻳ ﺎﻠﺤﻣﺭﺍ ﻡﺭﻭﺭﺍ ﺏﺎﻠﻌﻘﺑ ﺎﻠﺳﻭﺩﺍ ﻮﻤﻨﻫﺍ ﻲﻨﺣﺩﺭ ﺎﻠﺷﺍﺮﻋ ﻝ ﻕﺮﻳ ﺲﻨﺗ ﻮﻗﺭﻯ ﺺﻨﺗ ﺎﻠﺤﺒﻟ ﺎﻠﻠﺘﻴﻧ ﺖﺘﺒﻋﺎﻧ ﻭﻼﻳ ﺐﻫﻻ ﻮﻴﺤﺘﻀﻨﻬﻣﺍ ﺞﺒﻟ ﻙﻭﺭ ﻊﻟ ﺲﻓﻮﺤﻫ ﺎﻠﺷﺮﻘﻳ ﻲﺒﻠﻏ ﻁﻮﻟ ﻩﺫﺍ ﺎﻠﺷﺍﺮﻋ ﻢﺳﺎﻓ ﻚﻣ ﺎﻟﺮﺠﻟ ﻝ ﻭﺍﺪﻳ ﺎﻠﻌﻴﻧ ﺏﻭﻼﻳ ﻊﺑﺮﻳ ﻊﺑﺭ ﻩﺫﺍ ﺎﻠﺷﺍﺮﻋ ﺖﺘﻴﺣ ﺎﻠﻓﺮﺻ ﻞﻤﺷﺎﻫﺩ ﺞﺒﻟ ﻙﻭﺭ ﻢﻧ ﺎﻠﺠﻫ ﺎﻠﺷﺮﻘﻳ ﻭﺍﻼﺴﺘﻤﺗﺎﻋ ﺏﺍﻼﻨﺣﺩﺍﺭﺎﺗ ﺎﻠﺠﻤﻴﻟ ﻞﺠﻴﻟ ﻢﺸﻃ ﻩﺫﺍ ﺎﻠﺠﺒﻟ ﺎﻟﺬﻳ ﻲﺘﻐﻳﺭ ﺶﻜﻠﻫ ﻚﻠﻣﺍ ﺎﺨﺘﻠﻔﺗ ﺯﺍﻮﻳ ﻢﺷﺎﻫﺪﺘﻫ ﻮﻜﻟ ﺯﺍﻮﻳ ﻢﻨﻫ ﻲﻈﻫﺭ ﻒﻴﻫﺍ ﺐﻟﻮﺟ ﺞﻤﻴﻟ ﻢﺨﺘﻠﻓ ﻊﻧ ﺎﻠﺧﺭ ﻡﺭ ﻲﺷﺎﻫﺩ ﺐﻃﺭﺎﻔﻫ ﺎﻠﻤﺴﻨﻧ ﻚﻨﻫ ﺎﻠﻤﺸﻃ ﻝﺬﻠﻛ ﺲﻤﻳ ﺐﻫﺫﺍ ﺍﻼﺴﻣ ﻮﻣﺭ ﺥﺭ ﻲﺑﺩﻭ ﻚﻨﻫ ﻦﺑﺎﺗ ﺎﻠﻔﻃﺭ ﺥﺎﺻ ﻊﻧﺩ ﻢﺷﺎﻫﺪﺘﻫ ﻢﻧ ﺎﻠﺠﻫ ﺎﻠﺷﺮﻘﻳ ﻮﺑﺩﺍ ﻞﻳ ﻡﺭ ﻢﻧ ﻡﺮﺘﻔﻋﺎﺗ ﺞﺒﻟ ﺶﻤﺳ ﻚﻨﻫ ﻩﺮﻣ ﺾﺨﻣ ﻊﻟ ﺎﻨﺣﺩﺍﺭﺎﺗ ﺞﺒﻟ ﺶﻤﺳ ﺎﻠﻏﺮﺒﻳ" }, 差异字符如下,都是很形似的单词。这种情况有什么好办法优化吗?

ﺐﻟﻮﺟ vs ﺐﻟﻮﺣ ﻞﺠﻴﻟ vs ﻞﺠﺒﻟ ﺎﻟﺮﺠﻟ vs ﺎﻟﺮﺤﻟ ﻮﻗﺭﻯ vs ﻮﻗﺮﻳ ﺎﻠﺤﺒﻟ vs ﺎﻠﺠﺒﻠﻳ ﺎﻟﺯﺭﺎﺒﻳ vs ﺎﻠﺗﺭﺎﺒﻳ

Ucas-HaoranWei commented 2 hours ago

我很怀疑是目前的encoder对这些相似字符不可分,导致到decoder这无法decode好,需要unfreeze encoder训练,你是直接freeze encoder finetune的吗?

leo-ztjht commented 2 hours ago

我是使用readme里的train脚本训练的,没有设置freeze_vision_tower,默认是unfreeze encoder训练的吧?

Ucas-HaoranWei commented 2 hours ago

不,默认是freeze,我写死了,https://github.com/Ucas-HaoranWei/GOT-OCR2.0/blob/main/GOT-OCR-2.0-master/GOT/model/GOT_ocr_2_0.py#L132 以及 146行改成True