About More General OCR Results

Hi GOT Team,

About the general OCR capabilities (molecular formulas, sheet music, and charts etc), from what I understand, the current model does support support inline translations of these formats within the document. For clarity, an example would be, if I have a textbook about geometry, the model can't perform OCR on the text and the geometrical shapes simultaneously.

I am planning to do such an implementation, but I thought maybe you may have already tried it. Have you? And if so, were there any technical issues that you faced (maybe the current lightweight model is not suitable)? Or, maybe it just takes time to build a dataset to reliably train such functionality.

Thanks for reading!

Ucas-HaoranWei / GOT-OCR2.0

About More General OCR Results #105