VikParuchuri / surya

OCR, layout analysis, reading order, line detection in 90+ languages
https://www.datalab.to
GNU General Public License v3.0
9.28k stars 587 forks source link

about training data set #71

Open wonders7796 opened 3 months ago

wonders7796 commented 3 months ago

Thank you very much for the open source project. After I tried it, it worked very well. Can you please give me some details about your training data set。

sralvins commented 3 months ago

looks like DocLaynet dataset for text lines and layout detection. (not sure for ocr, but doclaynet contains machine-generated ocr annotations)

vbonnivardprobayes commented 3 months ago

how about the ordering model?