OCR-D / ocrd_all

Master repository which includes most other OCR-D repositories as submodules
MIT License
72 stars 18 forks source link

OCR-D workflow for slower processors reports errors #450

Open stweil opened 1 month ago

stweil commented 1 month ago

I tried to apply the suggested workflow for slower processors. It failed in the last step:

20:42:49.364 INFO ocrd.task_sequence.run_tasks - Finished processing task 'cis-ocropy-dewarp -I OCR-D-SEG -O OCR-D-SEG-DEWARP -p '{"dpi": 0, "range": 4.0, "smoothness": 1.0, "max_neighbour": 0.05}''
20:42:49.436 INFO ocrd.task_sequence.run_tasks - Start processing task 'tesserocr-recognize -I OCR-D-SEG-DEWARP -O OCRD_SLOWER_PROCESSOR -p '{"textequiv_level": "glyph", "overwrite_segments": true, "model": "germa
n_print", "dpi": 0, "padding": 0, "segmentation_level": "word", "overwrite_text": true, "shrink_polygons": false, "block_polygons": false, "find_tables": true, "find_staves": false, "sparse_text": false, "raw_line
s": false, "char_whitelist": "", "char_blacklist": "", "char_unblacklist": "", "tesseract_parameters": {}, "xpath_parameters": {}, "xpath_model": {}, "auto_model": false, "oem": "DEFAULT"}''
20:44:00.802 ERROR ocrd.workspace.image_from_segment - segment "region0002_line0000" image (binarized,despeckled,binarized,dewarped; 2555x346) has not been cropped properly (2555x243)
20:44:00.968 ERROR ocrd.workspace.image_from_segment - segment "region0004_line0000" image (binarized,despeckled,binarized,dewarped; 135x122) has not been cropped properly (135x82)
20:44:01.130 ERROR ocrd.workspace.image_from_segment - segment "region0006_line0000" image (binarized,despeckled,binarized,dewarped; 1116x68) has not been cropped properly (1116x78)
20:44:01.287 ERROR ocrd.workspace.image_from_segment - segment "region0007_line0000" image (binarized,despeckled,binarized,dewarped; 907x66) has not been cropped properly (907x64)
20:44:01.600 ERROR ocrd.workspace.image_from_segment - segment "region0009_line0000" image (binarized,despeckled,binarized,dewarped; 663x74) has not been cropped properly (663x62)
20:44:01.874 ERROR ocrd.workspace.image_from_segment - segment "region0014_line0000" image (binarized,despeckled,binarized,dewarped; 300x162) has not been cropped properly (300x92)
Ignoring extant glyph: 549,1558 577,1557 577,1582 549,1583
20:44:01.979 ERROR ocrd.workspace.image_from_segment - segment "region0015_line0000" image (binarized,despeckled,binarized,dewarped; 1117x66) has not been cropped properly (1117x59)
20:44:02.114 ERROR ocrd.workspace.image_from_segment - segment "region0015_line0001" image (binarized,despeckled,binarized,dewarped; 1189x68) has not been cropped properly (1189x61)
[...]

Since I still don't know which step is the culprit, I report the error here.

See more details here.

stweil commented 1 month ago

It looks like these "errors" should be marked as "warnings", because they are not fatal: PAGE XML with text results was created.