OCR4all / LAREX

A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.
MIT License
177 stars 33 forks source link

combine segments does not merge its lines #307

Open bertsky opened 2 years ago

bertsky commented 2 years ago

Currently, when I combine segments, all lower-level information is lost (lines, text). That usually means a lot of follow-up work (esp. considering #306).

I can see that there might be other use-cases where this behaviour is economic, but often one just wants to avoid oversegmentation but keep all the lines (including their text content).

So I suggest always merging the lines (i.e. concatenating the existing lines in the reading order of the selected segments). One can still quickly remove all lines if necessary (but not the other way round).

bertsky commented 2 years ago

@chaddy314 can I do anything to assist? If there is some code snippet you can point me to, I'd be happy to try making a PR.