Ucas-HaoranWei / GOT-OCR2.0

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
4.78k stars 392 forks source link

About More General OCR Results #105

Open cem-sirin opened 1 day ago

cem-sirin commented 1 day ago

Hi GOT Team,

About the general OCR capabilities (molecular formulas, sheet music, and charts etc), from what I understand, the current model does support support inline translations of these formats within the document. For clarity, an example would be, if I have a textbook about geometry, the model can't perform OCR on the text and the geometrical shapes simultaneously.

I am planning to do such an implementation, but I thought maybe you may have already tried it. Have you? And if so, were there any technical issues that you faced (maybe the current lightweight model is not suitable)? Or, maybe it just takes time to build a dataset to reliably train such functionality.

Thanks for reading!

Ucas-HaoranWei commented 1 day ago

Hello, Good idea! We do not try the whole-page OCR with texts and the general OCR interleaves because we do not have the corresponding data. Looking forward to your work!