OCR-D / ocrd_tesserocr

Run tesseract with the tesserocr bindings with @OCR-D's interfaces
MIT License
39 stars 10 forks source link

shapely.errors.TopologicalError: The operation 'GEOSWithin_r' could not be performed. Likely cause is invalidity of the geometry <shapely.geometry.polygon.Polygon object at 0x7fb58a658b80> #160

Closed mikegerber closed 3 years ago

mikegerber commented 3 years ago

Using the current master on this workspace: 2020-11-ocrd_tesserocr-topologyexception.zip, I get the following error:

% ocrd-tesserocr-recognize   -I OCR-D-GT-PAGE-BINPAGE-sauvola -p OCR-D-OCR-TESS-frk+deu-OCR-D-GT-PAGE-BINPAGE-sauvola.json -O OCR-D-OCR-TESS-frk+deu-OCR-D-GT-PAGE-BINPAGE-sauvola --overwrite                                                                 12:49:57.199 INFO processor.TesserocrRecognize - Using model 'frk+deu' in /usr/share/tesseract//tessdata/ for recognition at the glyph level
12:49:57.199 INFO processor.TesserocrRecognize - INPUT FILE 0 / PHYS_0024
12:49:57.755 INFO processor.TesserocrRecognize - Page 'PHYS_0024' images will use 300 DPI from image meta-data
12:49:57.755 INFO processor.TesserocrRecognize - Processing page 'PHYS_0024'
12:50:20.074 ERROR shapely.geos - TopologyException: side location conflict at 392 2912
Traceback (most recent call last):
  File "/home/mike/.virtualenvs/ocrd_tesserocr/lib/python3.8/site-packages/shapely/predicates.py", line 15, in __call__
    return self.fn(this._geom, other._geom, *args)
  File "/home/mike/.virtualenvs/ocrd_tesserocr/lib/python3.8/site-packages/shapely/geos.py", line 584, in errcheck_predicate
    raise PredicateError("Failed to evaluate %s" % repr(func))
shapely.errors.PredicateError: Failed to evaluate <_FuncPtr object at 0x7fb590f87700>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mike/.virtualenvs/ocrd_tesserocr/bin/ocrd-tesserocr-recognize", line 8, in <module>
    sys.exit(ocrd_tesserocr_recognize())
  File "/home/mike/.virtualenvs/ocrd_tesserocr/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/mike/.virtualenvs/ocrd_tesserocr/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/mike/.virtualenvs/ocrd_tesserocr/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/mike/.virtualenvs/ocrd_tesserocr/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/mike/.virtualenvs/ocrd_tesserocr/lib/python3.8/site-packages/ocrd_tesserocr/cli.py", line 36, in ocrd_tesserocr_recognize
    return ocrd_cli_wrap_processor(TesserocrRecognize, *args, **kwargs)
  File "/home/mike/.virtualenvs/ocrd_tesserocr/lib/python3.8/site-packages/ocrd/decorators/__init__.py", line 81, in ocrd_cli_wrap_processor
    run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
  File "/home/mike/.virtualenvs/ocrd_tesserocr/lib/python3.8/site-packages/ocrd/processor/helpers.py", line 69, in run_processor
    processor.process()
  File "/home/mike/.virtualenvs/ocrd_tesserocr/lib/python3.8/site-packages/ocrd_tesserocr/recognize.py", line 185, in process
    self._process_regions(tessapi, regions, page_image, page_xywh)
  File "/home/mike/.virtualenvs/ocrd_tesserocr/lib/python3.8/site-packages/ocrd_tesserocr/recognize.py", line 225, in _process_regions
    self._process_lines(tessapi, textlines, region_image, region_xywh)
  File "/home/mike/.virtualenvs/ocrd_tesserocr/lib/python3.8/site-packages/ocrd_tesserocr/recognize.py", line 263, in _process_lines
    self._process_words_in_line(tessapi.GetIterator(), line, line_xywh)
  File "/home/mike/.virtualenvs/ocrd_tesserocr/lib/python3.8/site-packages/ocrd_tesserocr/recognize.py", line 279, in _process_words_in_line
    polygon2 = polygon_for_parent(polygon, line)
  File "/home/mike/.virtualenvs/ocrd_tesserocr/lib/python3.8/site-packages/ocrd_tesserocr/segment_region.py", line 319, in polygon_for_parent
    if childp.within(parentp):
  File "/home/mike/.virtualenvs/ocrd_tesserocr/lib/python3.8/site-packages/shapely/geometry/base.py", line 779, in within
    return bool(self.impl['within'](self, other))
  File "/home/mike/.virtualenvs/ocrd_tesserocr/lib/python3.8/site-packages/shapely/predicates.py", line 18, in __call__
    self._check_topology(err, this, other)
  File "/home/mike/.virtualenvs/ocrd_tesserocr/lib/python3.8/site-packages/shapely/topology.py", line 35, in _check_topology
    raise TopologicalError(
shapely.errors.TopologicalError: The operation 'GEOSWithin_r' could not be performed. Likely cause is invalidity of the geometry <shapely.geometry.polygon.Polygon object at 0x7fb58a658b80>
mikegerber commented 3 years ago

0.9.2 runs without this error

kba commented 3 years ago

The line that triggers this issue in @mikegerber's workspace is l92:

PAGE-XML

```xml w o h l g e b l . wohlgebl. a n b e t r i t , anbetrifft, d e r der H o f r a t h Hofrath S e n e n b e r g Senenberg z u zu ſ e i n e m ſeinem g r ß t e n grßten L e i d w e e ſ e n Leidweeſen b e k e n n e n , bekennen, d a ß daß E r Er d i e ſ e l b e , dieſelbe, wohlgebl. anbetrifft, der Hofrath Senenberg zu ſeinem grßten Leidweeſen bekennen, daß Er dieſelbe, ```

bertsky commented 3 years ago

Thanks! (@kba I edited your comment to make this more readible)

I get a Self-intersection[392 2911] here.

Seems like I unnecessarily do the within test before the make_valid op. Will fix as part of #158.