Rid7 / Table-OCR

Recognize tables from images and restore them into word.
GNU General Public License v3.0
269 stars 70 forks source link

about train data #3

Closed cqray1990 closed 4 years ago

cqray1990 commented 4 years ago

which train data you use? labeling by yourself or some public data? if public data,which data you use? Thank you

Rid7 commented 4 years ago

I customized my data with PDF files and digital scan files. Some public datasets like SciTSR, tablebank, PubTabNet would be helpful while they are quite neat, I think blended them with noise should be a process make sense.

cqray1990 commented 4 years ago

how did you label your data, and what label format is like? thank you so much

cqray1990 commented 4 years ago

the project can not be trained? it is not complete

Rid7 commented 4 years ago

the project can not be trained? it is not complete

Actually, I do not think open my train code is necessary since I can't open my datasets and it is far from good right now. This repo is more significant in how to process the table cells after you get the lines. If you are interested in training, you could find some reference in some road detection examples, they are quite similar.