OCR-D / ocrd_tesserocr

Run tesseract with the tesserocr bindings with @OCR-D's interfaces
MIT License
38 stars 11 forks source link

ocrd-tesserocr-segment does not work #189

Closed jbarth-ubhd closed 1 year ago

jbarth-ubhd commented 1 year ago

the first recommended command in »Step 7: Region segmentation« does not work:

ocrd-tesserocr-segment -I OCR-D-004 -O OCR-D-005 -P find_tables false -P shrink_polygons true

+ /home/hd/hd_hd/hd_wu120/local/bin/time singularity exec -e --env-file /home/hd/hd_hd/hd_wu120/ocrd.env --env MAGICK_TEMPORARY_PATH=/scratch/hd_wu120_job_648164_m05n02 --
env TMPDIR=/scratch/hd_wu120_job_648164_m05n02 /home/hd/hd_hd/hd_wu120/ocrd.sif ocrd-tesserocr-segment -I OCR-D-004 -O OCR-D-005 -P find_tables false -P shrink_polygons true
UID: readonly variable
GID: readonly variable
16:14:26.156 INFO processor.TesserocrSegment - INPUT FILE 0 / P_00001
16:14:26.484 INFO processor.TesserocrSegment - Page 'P_00001' images will use 400 DPI from image meta-data
16:14:26.484 INFO processor.TesserocrSegment - Processing page 'P_00001'
16:14:27.050 INFO ocrd.workspace.save_image_file - created file ID: OCR-D-005_00001.IMG-BIN, file_grp: OCR-D-005, path: OCR-D-005/OCR-D-005_00001.IMG-BIN.png
Traceback (most recent call last):
  File "/usr/local/bin/ocrd-tesserocr-segment", line 8, in <module>
    sys.exit(ocrd_tesserocr_segment())
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/ocrd_tesserocr/cli.py", line 18, in ocrd_tesserocr_segment
    return ocrd_cli_wrap_processor(TesserocrSegment, *args, **kwargs)
  File "/build/core/ocrd/ocrd/decorators/__init__.py", line 117, in ocrd_cli_wrap_processor
    run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
  File "/build/core/ocrd/ocrd/processor/helpers.py", line 107, in run_processor
    processor.process()
  File "/usr/local/lib/python3.6/site-packages/ocrd_tesserocr/segment.py", line 69, in process
    return self.recognizer.process()
  File "/usr/local/lib/python3.6/site-packages/ocrd_tesserocr/recognize.py", line 445, in process
    self._process_regions_in_page(tessapi.GetIterator(), page, page_coords, pcgts_mapping, dpi)
  File "/usr/local/lib/python3.6/site-packages/ocrd_tesserocr/recognize.py", line 525, in _process_regions_in_page
    for symbol in iterate_level(it, RIL.SYMBOL, parent=RIL.BLOCK)])
  File "/usr/local/lib/python3.6/site-packages/ocrd_tesserocr/recognize.py", line 1456, in join_polygons
    for poly in polygons]))
  File "/usr/local/lib/python3.6/site-packages/ocrd_tesserocr/recognize.py", line 1456, in <listcomp>
    for poly in polygons]))
AttributeError: 'list' object has no attribute 'type'
Command exited with non-zero status 1
1.26user 1.63system 0:04.85elapsed 59%CPU (0avgtext+0avgdata 192008maxresident)k
149380inputs+0outputs (144major+68822minor)pagefaults 0swaps
jbarth-ubhd commented 1 year ago

ocrd.sif built from (docker) ocrd:maximum yesterday.

kba commented 1 year ago

Can you please share the workspace where this happens, so I can reproduce? Thanks!

jbarth-ubhd commented 1 year ago

https://digi.ub.uni-heidelberg.de/diglitData/v/ocrd-bug-tessocr-segment--extracts-dir-g.zip

extracts to dir g and contains bug-demo.sh and nohup.out. To try again, remove mets.xml, OCR-D-00*.

bertsky commented 1 year ago

@kba the problem is with the recent Shapely 2.0 release. I had prepared for the transition, but did not read the migration guide carefully enough. Apparently, they also renamed the type attribute to geom_type

jbarth-ubhd commented 1 year ago

Perhaps this issue should be moved to ocrd-all ... but I can't do that.

bertsky commented 1 year ago

@jbarth-ubhd thanks for your report. I was able to reproduce. Did not surface, because I had no test coverage for shrink_polygons – now do. Fixed along with #191