VikParuchuri / surya

OCR, layout analysis, reading order, line detection in 90+ languages
https://www.datalab.to
GNU General Public License v3.0
9.32k stars 591 forks source link

Finetune smaller model on computer screenshots ? #113

Open apirrone opened 2 months ago

apirrone commented 2 months ago

Hi !

First, great work, from what I tested it seems to work really well, congrats !

I have an use case where I need to perform OCR/Layout analysis etc on computer screenshots. surya actually works really well for such images, but I wonder how a smaller model trained only on such images would perform. In my use case, the screenshots would need to be fully processed quite fast (ideally under 2 seconds per screenshot) and without taking too much memory or CPU/GPU.

Maybe I am wrong, but the problem seems simpler than training a general model that works on any kind of document like surya does. Do you think a small model could do the job ?

Thanks !

metatrot commented 1 month ago

I'm also looking for a screenshot use-case. Most OCR seems geared to photos, handwriting, or PDFs. They don't do great on normal GUI text.