Closed EEngl52 closed 4 years ago
ocrd-cis-ocropy-segment --mets /test/data/almahide/mets.xml --working-dir /test/data/almahide --input-file-grp OCR-D-DENOISE --output-file-grp OCR-D-SEG-REGION --parameter /test/data/ocrd/taverna/models/param-cis-seg-page.json --log-level ERROR
19:14:52.753 INFO root - Overriding log level globally to ERROR
19:29:35.092 ERROR shapely.geos - TopologyException: Input geom 0 is invalid: Self-intersection at or near point -1 561 at -1 561
Traceback (most recent call last):
File "/home/habocr/newinstallation/ocrd_all/venv/bin/ocrd-cis-ocropy-segment", line 8, in <module>
sys.exit(ocrd_cis_ocropy_segment())
File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/ocrd_cis/ocropy/cli.py", line 54, in ocrd_cis_ocropy_segment
return ocrd_cli_wrap_processor(OcropySegment, *args, **kwargs)
File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/ocrd/decorators.py", line 102, in ocrd_cli_wrap_processor
run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/ocrd/processor/base.py", line 61, in run_processor
processor.process()
File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/ocrd_cis/ocropy/segment.py", line 306, in process
page_id, file_id, zoom, rogroup=rogroup)
File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/ocrd_cis/ocropy/segment.py", line 579, in _process_element
line_polygon = polygon_for_parent(line_polygon, region)
File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/ocrd_cis/ocropy/segment.py", line 676, in polygon_for_parent
interp = childp.intersection(parentp)
File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/shapely/geometry/base.py", line 649, in intersection
return geom_factory(self.impl['intersection'](self, other))
File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/shapely/topology.py", line 70, in __call__
self._check_topology(err, this, other)
File "/home/habocr/newinstallation/ocrd_all/venv/lib/python3.6/site-packages/shapely/topology.py", line 38, in _check_topology
self.fn.__name__, repr(geom)))
shapely.errors.TopologicalError: The operation 'GEOSIntersection_r' could not be performed. Likely cause is invalidity of the geometry <shapely.geometry.polygon.Polygon object at 0x7f12f8f12828>
Sorry, cannot reproduce yet. My segmentation runs correctly. Can you tell me the parameters you used in that workflow? And which versions (esp. ocrd_anybaseocr and ocrd_olena)?
thanks for looking into this so quickly!
I'm using ocrd_all
natively, last commit ca24263 (ocrd-olena-binarize version 1.2.0 and
ocrd-anybasocr-cropversion 0.0.5). I didn't specify any parameters except region level for
ocrd-cis-ocropy-deskewand
ocrd-cis-ocropy-segment`
I'm using
ocrd_all
natively, last commit ca24263 (ocrd-olena-binarize version 1.2.0 and
ocrd-anybasocr-cropversion 0.0.5). I didn't specify any parameters except region level for
ocrd-cis-ocropy-deskewand
ocrd-cis-ocropy-segment`
Okay, sadly, ocrd_all hasn't updated the modules for some time now (because we want to migrate all OCR-D/spec#164 changes at once). So the difference might be caused by ocrd-cis-ocropy-binarize
's DPI zoom change, checking...
I'm using
ocrd_all
natively, last commit ca24263 (ocrd-olena-binarize version 1.2.0 and
ocrd-anybasocr-cropversion 0.0.5). I didn't specify any parameters except region level for
ocrd-cis-ocropy-deskewand
ocrd-cis-ocropy-segment`Okay, sadly, ocrd_all hasn't updated the modules for some time now (because we want to migrate all OCR-D/spec#164 changes at once). So the difference might be caused by
ocrd-cis-ocropy-binarize
's DPI zoom change, checking...
Cannot reproduce with current ocrd/all:maximum
(built from OCR-D/ocrd_all@5413688 which has identical submodules to your native OCR-D/ocrd_all@ca24263).
ok, tried it again now. For whatever reason it works if I only process this single image but it keeps failing if I try to process the whole book mets.zip
ok, tried it again now. For whatever reason it works if I only process this single image but it keeps failing if I try to process the whole book
all I need is an image file which fails …
But that's the same as above!
To test your hypothesis that it happens only on non-first pages in a sequence, I re-added this page as another. Cannot reproduce it with this setup.
As to your METS: I need the images of course! I can see only local JPG references in MAX, but DEFAULT has some remote URLs. Is that the right fileGrp?
Thanks @EEngl52 for sharing the METS and images! I can reproduce now. The problem seems to be an instance of what I described in point 3 of https://github.com/OCR-D/ocrd_segment/pull/43 – namely that rounding (here: when converting the line polygon from relative to absolute coordinates via coordinate_for_segment
) can make a valid Polygon shape invalid (self-intersect). I will try to make an analogous fix here.
ocrd-cis-ocropy-segment
crashed completely on this picture with the following workflow: ocrd-cis-ocropy-binarize|MAX|OCR-D-BIN1| | |ERROR ocrd-anybaseocr-crop|OCR-D-BIN1|OCR-D-CROP| | |ERROR ocrd-olena-binarize|OCR-D-CROP|OCR-D-BIN| | |ERROR ocrd-cis-ocropy-deskew|OCR-D-BIN|OCR-D-DESKEW| | /test/data/ocrd/taverna/models/param-cis-deskew-page.json |ERROR ocrd-cis-ocropy-denoise|OCR-D-DESKEW|OCR-D-DENOISE| | |ERROR ocrd-cis-ocropy-segment|OCR-D-DENOISE|OCR-D-SEG-REGION| | /test/data/ocrd/taverna/models/param-cis-seg-page.json |ERROR