OCR-D / ocrd_tesserocr

Run tesseract with the tesserocr bindings with @OCR-D's interfaces
MIT License
38 stars 11 forks source link

montfaucon1719bd2_1, page 210, ocrd-tesserocr-segment -P find_tables false -P shrink_polygons true #181

Open jbarth-ubhd opened 2 years ago

jbarth-ubhd commented 2 years ago

this image

https://digi.ub.uni-heidelberg.de/diglitData/v/montfaucon1719bd2_1.210.tif

UPDATE same for https://digi.ub.uni-heidelberg.de/diglitData/v/montfaucon1719bd2_1.168a_Planche_72.tif

with this workflow (latest ocrd_all as of 2021-12-01)

ocrd workspace init 
ocrd workspace add -g P_00001 -G OCR-D-IMG -i OCR-D-IMG_00001 -m image/tiff OCR-D-IMG/00001.tif 

ocrd-olena-binarize -P k 0.10 -I OCR-D-IMG -O OCR-D-001 
ocrd-anybaseocr-crop -I OCR-D-001 -O OCR-D-002 
ocrd-olena-binarize -I OCR-D-002 -O OCR-D-003 
ocrd-cis-ocropy-deskew -P level-of-operation page -I OCR-D-003 -O OCR-D-004 
ocrd-tesserocr-segment -P find_tables false -P shrink_polygons true -I OCR-D-004 -O OCR-D-005 
ocrd-calamari-recognize -I OCR-D-005 -O OCR-D-OCR -P checkpoint "$HOME/ocrd/_models/ocrd-calamari-recognize/c1_latin-script-hist-3/*.ckpt.json" 

leads to this error messages:

10:06:58.121 INFO processor.TesserocrSegment - INPUT FILE 0 / P_00001
10:06:59.193 INFO processor.TesserocrSegment - Page 'P_00001' images will use 333 DPI from image 
meta-data
10:06:59.193 INFO processor.TesserocrSegment - Processing page 'P_00001'
10:07:00.229 INFO ocrd.workspace.save_image_file - created file ID: OCR-D-005_00001.IMG-BIN, 
file_grp: OCR-D-005, path: OCR-D-005/OCR-D-005_00001.IMG-BIN.png
/build/ocrd_tesserocr/ocrd_tesserocr/recognize.py:510: ShapelyDeprecationWarning: The proxy 
geometries (through the 'asShape()', 'asPolygon()' or 'PolygonAdapter()' constructors) are 
deprecated and will be removed in Shapely 2.0. Use the 'shape()' function or the standard 
'Polygon()' constructor instead.
  for symbol in iterate_level(it, RIL.SYMBOL, parent=RIL.BLOCK)])
Exception ignored in: <bound method BaseGeometry.__del__ of 
<shapely.geometry.polygon.PolygonAdapter object at 0x7fc431060358>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/shapely/geometry/base.py", line 209, in __del__
    self._empty(val=None)
  File "/usr/local/lib/python3.6/dist-packages/shapely/geometry/base.py", line 199, in _empty
    self._is_empty = True
  File "/usr/local/lib/python3.6/dist-packages/shapely/geometry/proxy.py", line 44, in __setattr__
    object.__setattr__(self, name, value)
AttributeError: can't set attribute
10:07:00.930 INFO processor.TesserocrSegment - Detected region 'region0000': 2867,801 2418,798 
1883,799 1527,803 1527,803 1184,824 1184,824 1183,824 1183,824 1183,824 1183,824 1183,824 1183,825 
1181,827 1180,827 1180,827 1180,827 1180,827 1180,827 1180,828 1180,828 1180,828 1180,838 1172,2362 
1171,3063 1175,3451 1175,3451 1175,3451 1175,3452 1175,3452 1175,3452 1175,3452 1175,3452 1176,3452 
1176,3453 1176,3453 1176,3453 1176,3453 1176,3453 1177,3453 1260,3474 1260,3474 1260,3474 1304,3474 
1945,3458 1945,3458 3324,3389 3324,3389 3325,3389 3348,3382 3348,3382 3348,3382 3348,3382 3348,3382 
3348,3381 3349,3381 3349,3381 3349,3381 3349,3381 3349,3381 3349,3380 3349,3380 3349,3380 3387,1134 
3388,1069 3388,1069 3377,954 3377,954 3377,953 3377,953 3377,953 3377,953 3354,913 3354,913 
3353,913 3353,912 3353,912 3353,912 3353,912 3130,804 3130,804 3129,804 3129,804 3129,804 
(FLOWING_TEXT)
...
...
...
Exception ignored in: <bound method BaseGeometry.__del__ of 
<shapely.geometry.polygon.PolygonAdapter object at 0x7fc40f820710>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/shapely/geometry/base.py", line 209, in __del__
    self._empty(val=None)
  File "/usr/local/lib/python3.6/dist-packages/shapely/geometry/base.py", line 199, in _empty
    self._is_empty = True
  File "/usr/local/lib/python3.6/dist-packages/shapely/geometry/proxy.py", line 44, in __setattr__
    object.__setattr__(self, name, value)
AttributeError: can't set attribute
10:07:16.823 INFO processor.TesserocrSegment - Detected line 'region0005_line0010': 2366,4729 
2366,4729 2366,4729 2291,4740 2290,4740 2290,4740 2290,4740 2290,4740 2290,4740 2290,4741 2289,4741 
2289,4741 2289,4741 2289,4741 2289,4741 2289,4742 2289,4742 2289,4742 2289,4780 2289,4780 2289,4780 
2289,4781 2289,4781 2289,4781 2289,4781 2289,4781 2290,4781 2290,4782 2290,4782 2290,4782 2290,4782 
2290,4782 2291,4782 2291,4782 2291,4782 2650,4795 2895,4801 2905,4801 2905,4801 3188,4781 3188,4781 
3189,4781 3189,4781 3189,4781 3189,4781 3189,4781 3189,4780 3190,4780 3190,4780 3190,4780 3190,4780 
3190,4780 3190,4779 3190,4779 3190,4779 3190,4768 3190,4768 3190,4768 3190,4767 3190,4767 3190,4767 
3190,4767 3190,4767 3189,4767 3189,4766 3189,4766 3189,4766 3189,4766 3189,4766 3188,4766 3188,4766 
2705,4736 2705,4736 2638,4732
Traceback (most recent call last):
  File "/usr/local/sub-venv/headless-tf2/bin/ocrd-calamari-recognize", line 33, in <module>
    sys.exit(load_entry_point('ocrd-calamari', 'console_scripts', 'ocrd-calamari-recognize')())
  File "/usr/local/sub-venv/headless-tf2/lib/python3.6/site-packages/click/core.py", line 1128, in 
__call__
    return self.main(*args, **kwargs)
  File "/usr/local/sub-venv/headless-tf2/lib/python3.6/site-packages/click/core.py", line 1053, in 
main
    rv = self.invoke(ctx)
  File "/usr/local/sub-venv/headless-tf2/lib/python3.6/site-packages/click/core.py", line 1395, in 
invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/sub-venv/headless-tf2/lib/python3.6/site-packages/click/core.py", line 754, in 
invoke
    return __callback(*args, **kwargs)
  File "/build/ocrd_calamari/ocrd_calamari/cli.py", line 13, in ocrd_calamari_recognize
    return ocrd_cli_wrap_processor(CalamariRecognize, *args, **kwargs)
  File "/build/core/ocrd/ocrd/decorators/__init__.py", line 90, in ocrd_cli_wrap_processor
    raise Exception("Invalid input/output file grps:\n\t%s" % '\n\t'.join(report.errors))
Exception: Invalid input/output file grps:
        Input fileGrp[@USE='OCR-D-005'] not in METS!