OCR4all / LAREX

A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.
MIT License
179 stars 33 forks source link

Don't allow polygons outside image #301

Closed alexander-winkler closed 2 years ago

alexander-winkler commented 2 years ago

The Subtract Function is very useful. However, it can produce polygons outside the actual image, e.g. image

Those polygons are difficult to remove because you can't select them with shift + mouse rectangle selection. I think it's probably not even necessary to allow polygons outside the image in the first place, is it? My suggestion thus would be not to allow polygon vertices outside the dimensions of the image file.

maxnth commented 2 years ago

Yeah, this shouldn't be possible and is a bug. Could you possibly provide the image file and XML? Loading negative coordinates shouldn't be possible with the latest LAREX version (as it isn't allowed in PAGE) and I can't reproduce it with the LAREX segmentation.

maxnth commented 2 years ago

Anyways, the subtract functionality shouldn't produce segments containing negative points anymore, starting with 9dc3588ede8ee2285f317f20bfee8d484ea3b3e2

alexander-winkler commented 2 years ago

Yeah, this shouldn't be possible and is a bug. Could you possibly provide the image file and XML? Loading negative coordinates shouldn't be possible with the latest LAREX version (as it isn't allowed in PAGE) and I can't reproduce it with the LAREX segmentation.

==> FILES

maxnth commented 2 years ago

Sadly the provided test files won't be able to be opened in newer releases of LAREX (>=0.6) as LAREX won't open invalid PAGE XMLs files anymore and negative points in Coords/@points aren't allowed according to the schema. Nevertheless the fix in 9dc3588 should still prevent such edge cases. If a similar problem still appears with newer LAREX versions, please feel free to reopen this issue.