Closed M3ssman closed 4 years ago
Thanks @M3ssman for the full report!
Looks similar to #62 and https://github.com/OCR-D/ocrd_tesserocr/issues/149. I'd very much like to hunt this down, but the problem is with the producers of invalid coordinates, we cannot make each and every consuming processor robust to that kind of error.
Looking into your workflow and PAGE results, there's a self-intersection in TextRegion region0010
with 238,1073 240,1935 1929,1931 1927,932 1719,932 1719,909 238,913 238,936 238,1074 238,1073
(see last 2 points). That region was introduced by ocrd-segment-repair
(when reducing overlaps from bbox to polygon). I'll try to transfer the issue there and look what I can do.
@M3ssman I could run your workflow to completion with https://github.com/OCR-D/ocrd_segment/pull/43. Can you please try this with a full document?
Environment
ocrd/all
from 2020-08-04 (docker image id: 158ea3d64eae)Current Behavior:
When executing something like:
docker run --rm -u "40366" -w /data -v "/home/aqayv/project/ulb-it-migration/WORKSPACE_OCR/203074":/data -v /usr/share/tesseract-ocr/4.00/tessdata:/usr/local/share/tessdata/ ocrd/all:2020-08-04 ocrd-make -f ulb-ocrd-vd18-02.mk .
:Expected Behavior:
Please do not crash, but log an Error and move on gracefully
2020-09-10-bug-203074.zip