dhlab-epfl / dhSegment

Generic framework for historical document processing
https://dhlab-epfl.github.com/dhSegment
GNU General Public License v3.0
370 stars 116 forks source link

Baselines to Textlines #58

Closed ghost closed 3 years ago

ghost commented 4 years ago

@solivr @SeguinBe @raphaelBarman

Once I have detected the baseline masks, now how can I convert that into textline boxes/polygons

CrazyCrud commented 4 years ago

dhSegment does not return the bounding boxes of text lines so you need some additional computation. I tried a very basic approach by using the horizontal projection to extract the height of text lines and construct a text box (see here). A very interesting read is the paper Influence of Text Line Segmentation which goes more in-depth about this topic; maybe you can get some ideas from it.