OCR4all / LAREX

A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.
MIT License
179 stars 33 forks source link

Help - Segments #243

Closed IlaCode closed 3 years ago

IlaCode commented 3 years ago

larexexample

Hi, I'm trying to segment pages of a book. Why does this attached page see it as an image? Am I wrong to use LAREX? thanks a lot

maxnth commented 3 years ago

Why does this attached page see it as an image?

Most likely because of the "noise" surrounding the the actual page content which "confuses" the underlying algorithm.

Am I wrong to use LAREX?

I wouldn't say so, there are both internal and external ways to improve the results.

Within LAREX itself setting a Region of Interest (RoI in the toolbar) or using ignore regions to mark the e.g. book fold/book cover should vastly improve the results of the auto segmentation. Experimenting with the parameter values is often also useful.

When dealing with "raw" scans it's probably advisable to preprocess the images before using them with stand-alone LAREX though. Using a tool like ScanTailor or ScanTailor Advanced (or any other similar tool) to (semi-)automatically remove borders (or any other noise), dewarp or deskew the scans should lead to vastly improved results.