OCR4all / LAREX

A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.
MIT License
179 stars 33 forks source link

Unrotate segmentation not correct #285

Closed bertsky closed 2 years ago

bertsky commented 2 years ago

I think I may have found a bug in the handling of @orientation by LAREX.

I have a page with a slight skew:

FILE_0015_ORIGINAL

And an externally generated segmentation for it:

herrnhut_seg.zip

PageViewer (which completely ignores @orientation) renders that as follows:

herrnhut_jpageviewer

As you can see, the annotation contains /PcGts/Page/@orientation="1.5".

The PAGE-XML schema documentation says this:

                            The angle the rectangle encapsulating the page
                           (or its Border) has to be rotated in clockwise direction
                           in order to correct the present skew
                           (negative values indicate anti-clockwise rotation).

Thus, a positive 1.5° looks plausible.

Here's how LAREX renders this:

herrnhut_larex-unrot

Thus, the image has been deskewed correctly, but clearly the segmentation has not.

Measuring the resulting angle between annotation and image, I find the error to be close to the range of the angle itself (1.5°), rather than its double magnitude.

However, the code responsible for that does look correct at first glance:

https://github.com/OCR4all/LAREX/blob/4f97126dac411abdfa6aa9b7f201b2e3c28ff70f/src/main/webapp/resources/js/viewer/actions.js#L98-L102

https://github.com/OCR4all/LAREX/blob/4f97126dac411abdfa6aa9b7f201b2e3c28ff70f/src/main/webapp/resources/js/viewer/controller.js#L2327-L2341

https://github.com/OCR4all/LAREX/blob/4f97126dac411abdfa6aa9b7f201b2e3c28ff70f/src/main/webapp/resources/js/viewer/controller.js#L2266-L2268

https://github.com/OCR4all/LAREX/blob/4f97126dac411abdfa6aa9b7f201b2e3c28ff70f/src/main/webapp/resources/js/viewer/editor.js#L1120-L1121

https://github.com/OCR4all/LAREX/blob/4f97126dac411abdfa6aa9b7f201b2e3c28ff70f/src/main/webapp/resources/js/viewer/viewer.js#L504-L508

Unfortunately, the documentation of paper.Point.rotate does not say which directionality angle has. But from its tutorials, it says clockwise – which would be correct (given that PAGE-XML's @orientation is also clockwise).

Now, what do we do?

bertsky commented 2 years ago

Oh, just saw that region polygons are derotated correctly, just not the textline polygons!