dhlab-epfl / dhSegment

Generic framework for historical document processing
https://dhlab-epfl.github.com/dhSegment
GNU General Public License v3.0
370 stars 116 forks source link

how to get the baseline or text boxes for your model? #9

Closed zzdang closed 6 years ago

zzdang commented 6 years ago

Hello,I want to use your code to detect text boxes from scaned PDF. But I don't know your trained model is for page extraction or both for page extraction and baseline detection? Thank you very much!

solivr commented 6 years ago

Hi, The trained model we provide in the demo is for page extraction only. If you want to detect baselines you'll have to train your own model (you can use the READ-BAD dataset).