Closed bertsky closed 3 years ago
Thanks for investigating. Does this mean that the cropper should only be used on RGB images so the pylsd edge detection is executed?
Does this mean that the cropper should only be used on RGB images so the pylsd edge detection is executed?
Yes, but in the current implementation with feature_selector=binarized
you'd have to fake the binarization in the input. So really the code (including the OCR-D part) needs to be changed to take the raw image (for the pylsd part). However, for the textarea detector part, currently it uses an ad-hoc Otsu, which is clearly not a good choice. So at that point, one would need to take the binarized image from OCR-D (and probably modify the kernel size based on input image DPI with the usual zoom heuristic).
Under certain conditions,
ocrd-anybaseocr-crop
selects only noise fragments of a page. It seems to be related to its builtin ad-hoc textline detection filter which is based on a number of assumptions (e.g. fixed kernel sizes indicate a certain pixel resolution and/or font size is expected).Here's an example:
sauvola-ms-split
) image:Looking at the debug images being created along the process (after repairing the array conversion dynamic range), it seems that the only criterion for text areas is a morphological closing with a fixed kernel size, which captures the texture of the background around the page:
(I wonder why no better method of textline detection was used...)
Then, in turn, the text boxes/columns of course look bad:
In the end, the default
minArea
parameter of 0.05 removes all but the largest "column":This sheds a very bad light on that part of the algorithm.
But there's also the fallback mechanism of pylsd edge detection based border rectangle estimation. Since it is based on subpixel edge detection, this only works at all when running on the raw RGB image (instead of binarized), though. However, this works perfectly well:
First, the morphological closure again, this time of the raw image:
This will find us no large enough text components to work with!
Now the edge detection comes into play:
And from that its comparably easy to plausibilize the largest intersecting horizontal and vertical lines:
Final cropped raw image: