cisocrgroup / ocrd_cis

OCR-D python tools
MIT License
33 stars 12 forks source link

region segmentation crashes #62

Closed EEngl52 closed 4 years ago

EEngl52 commented 4 years ago

ocrd-cis-ocropy-segment crashed completely on this picture with the following workflow: ocrd-cis-ocropy-binarize|MAX|OCR-D-BIN1| | |ERROR ocrd-anybaseocr-crop|OCR-D-BIN1|OCR-D-CROP| | |ERROR ocrd-olena-binarize|OCR-D-CROP|OCR-D-BIN| | |ERROR ocrd-cis-ocropy-deskew|OCR-D-BIN|OCR-D-DESKEW| | /test/data/ocrd/taverna/models/param-cis-deskew-page.json |ERROR ocrd-cis-ocropy-denoise|OCR-D-DESKEW|OCR-D-DENOISE| | |ERROR ocrd-cis-ocropy-segment|OCR-D-DENOISE|OCR-D-SEG-REGION| | /test/data/ocrd/taverna/models/param-cis-seg-page.json |ERROR

cneud commented 4 years ago
ocrd-cis-ocropy-segment --mets /test/data/almahide/mets.xml --working-dir /test/data/almahide --input-file-grp OCR-D-DENOISE --output-file-grp OCR-D-SEG-REGION --parameter /test/data/ocrd/taverna/models/param-cis-seg-page.json --log-level ERROR
19:14:52.753 INFO root - Overriding log level globally to ERROR
19:29:35.092 ERROR shapely.geos - TopologyException: Input geom 0 is invalid: Self-intersection at or near point -1 561 at -1 561
Traceback (most recent call last):
  File "/home/habocr/newinstallation/ocrd_all/venv/bin/ocrd-cis-ocropy-segment", line 8, in <module>
    sys.exit(ocrd_cis_ocropy_segment())
  File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/ocrd_cis/ocropy/cli.py", line 54, in ocrd_cis_ocropy_segment
    return ocrd_cli_wrap_processor(OcropySegment, *args, **kwargs)
  File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/ocrd/decorators.py", line 102, in ocrd_cli_wrap_processor
    run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
  File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/ocrd/processor/base.py", line 61, in run_processor
    processor.process()
  File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/ocrd_cis/ocropy/segment.py", line 306, in process
    page_id, file_id, zoom, rogroup=rogroup)
  File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/ocrd_cis/ocropy/segment.py", line 579, in _process_element
    line_polygon = polygon_for_parent(line_polygon, region)
  File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/ocrd_cis/ocropy/segment.py", line 676, in polygon_for_parent
    interp = childp.intersection(parentp)
  File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/shapely/geometry/base.py", line 649, in intersection
    return geom_factory(self.impl['intersection'](self, other))
  File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/shapely/topology.py", line 70, in __call__
    self._check_topology(err, this, other)
  File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/shapely/topology.py", line 38, in _check_topology
    self.fn.__name__, repr(geom)))
shapely.errors.TopologicalError: The operation 'GEOSIntersection_r' could not be performed. Likely cause is invalidity of the geometry <shapely.geometry.polygon.Polygon object at 0x7f12f8f12828>
bertsky commented 4 years ago

Sorry, cannot reproduce yet. My segmentation runs correctly. Can you tell me the parameters you used in that workflow? And which versions (esp. ocrd_anybaseocr and ocrd_olena)?

EEngl52 commented 4 years ago

thanks for looking into this so quickly! I'm using ocrd_all natively, last commit ca24263 (ocrd-olena-binarize version 1.2.0 andocrd-anybasocr-cropversion 0.0.5). I didn't specify any parameters except region level forocrd-cis-ocropy-deskewandocrd-cis-ocropy-segment`

bertsky commented 4 years ago

I'm using ocrd_all natively, last commit ca24263 (ocrd-olena-binarize version 1.2.0 andocrd-anybasocr-cropversion 0.0.5). I didn't specify any parameters except region level forocrd-cis-ocropy-deskewandocrd-cis-ocropy-segment`

Okay, sadly, ocrd_all hasn't updated the modules for some time now (because we want to migrate all OCR-D/spec#164 changes at once). So the difference might be caused by ocrd-cis-ocropy-binarize's DPI zoom change, checking...

bertsky commented 4 years ago

I'm using ocrd_all natively, last commit ca24263 (ocrd-olena-binarize version 1.2.0 andocrd-anybasocr-cropversion 0.0.5). I didn't specify any parameters except region level forocrd-cis-ocropy-deskewandocrd-cis-ocropy-segment`

Okay, sadly, ocrd_all hasn't updated the modules for some time now (because we want to migrate all OCR-D/spec#164 changes at once). So the difference might be caused by ocrd-cis-ocropy-binarize's DPI zoom change, checking...

Cannot reproduce with current ocrd/all:maximum (built from OCR-D/ocrd_all@5413688 which has identical submodules to your native OCR-D/ocrd_all@ca24263).

EEngl52 commented 4 years ago

ok, tried it again now. For whatever reason it works if I only process this single image but it keeps failing if I try to process the whole book mets.zip

bertsky commented 4 years ago

ok, tried it again now. For whatever reason it works if I only process this single image but it keeps failing if I try to process the whole book

all I need is an image file which fails …

EEngl52 commented 4 years ago

file-max-idp140325664

bertsky commented 4 years ago

But that's the same as above!

To test your hypothesis that it happens only on non-first pages in a sequence, I re-added this page as another. Cannot reproduce it with this setup.

As to your METS: I need the images of course! I can see only local JPG references in MAX, but DEFAULT has some remote URLs. Is that the right fileGrp?

bertsky commented 4 years ago

Thanks @EEngl52 for sharing the METS and images! I can reproduce now. The problem seems to be an instance of what I described in point 3 of https://github.com/OCR-D/ocrd_segment/pull/43 – namely that rounding (here: when converting the line polygon from relative to absolute coordinates via coordinate_for_segment) can make a valid Polygon shape invalid (self-intersect). I will try to make an analogous fix here.