cisocrgroup / ocrd_cis

OCR-D python tools
MIT License
33 stars 12 forks source link

TopologicalError: GEOSIntersection_r could not be performed #67

Closed M3ssman closed 4 years ago

M3ssman commented 4 years ago

Environment

Expected Behavior:

Please do not crash, but log an Error and move on gracefully

2020-09-10-bug-203074.zip

bertsky commented 4 years ago

Thanks @M3ssman for the full report!

Looks similar to #62 and https://github.com/OCR-D/ocrd_tesserocr/issues/149. I'd very much like to hunt this down, but the problem is with the producers of invalid coordinates, we cannot make each and every consuming processor robust to that kind of error.

Looking into your workflow and PAGE results, there's a self-intersection in TextRegion region0010 with 238,1073 240,1935 1929,1931 1927,932 1719,932 1719,909 238,913 238,936 238,1074 238,1073 (see last 2 points). That region was introduced by ocrd-segment-repair (when reducing overlaps from bbox to polygon). I'll try to transfer the issue there and look what I can do.

bertsky commented 4 years ago

@M3ssman I could run your workflow to completion with https://github.com/OCR-D/ocrd_segment/pull/43. Can you please try this with a full document?