Closed jbarth-ubhd closed 3 years ago
I haven't seen this one before. Could you provide a test image and some more details on your calamari and tensorflow version and how and where the error occurred, please?
image: https://digi.ub.uni-heidelberg.de/diglitData/v/ocrd/hdz1886a_-_248_4.tif
workflow:
ocrd-sbb-binarize -I OCR-D-IMG -O OCR-D-001 -P model $HOME/ocrd_models/sbb/binarization/models
ocrd-cis-ocropy-deskew -I OCR-D-001 -O OCR-D-002
ocrd-sbb-textline-detector -I OCR-D-002 -O OCR-D-003 -P model $HOME/ocrd_models/sbb/textline
ocrd-calamari-recognize -I OCR-D-003 -O OCR-D-OCR -P checkpoint "$HOME/ocrd_models/calamari/calamari_models/gt4histocr/*.ckpt.json"
Oh and I'll have 1 case with ValueError: Error when checking input: expected input_1 to have shape (448, 896, 3) but got array with shape (448, 4, 3)
-> https://github.com/qurator-spk/sbb_textline_detection/issues/53
Had 1 case (gomez...) where some step complained about empty page, but in the third run this message vanished?! Perhaps a problem with the bwHpc cluster - had some "permission denied" errors last week, theoretically because quota, but processing was running on a filesystem without quota...
OK, I've installed OCR-d for the first time, it worked in most parts out of the box and I was able to reproduce the problem. Your errors seem to be caused by OCR-d processors, not by calamari.
Somehow the line segmentation produces empty lines or lines that are outside of text regions. When the empty images are converted to numpy (by ocrd_calamari, not by calamari), numpy throws an uncaught exception. You could fix it by inserting before line 77 in ocrd_calamari/recognize.py something like line_image = line_image if all(line_image.size) else [[0]]
, but that's only a temporary hack to avoid the error. I'm also not sure if their workspace.image_from_segment or even the line segmentation processor is supposed to produce empty lines at all, so maybe the real problem is somewhere deeper in the guts of the OCR-d machinery.
In any case it would be better to open a bug report in OCR-D/ocrd_calamari.
gomez...: temporary cluster problem.
gomez...: temporary cluster problem.
You mean https://github.com/OCR-D/ocrd_calamari/pull/49 fixed the problem mentioned here and additional problems were due to network/IO hickups?
hiccups ... well I would say hiccups², at least, but ... yes.
As this is a problem in ocrd_calamari, not in calamari itself, I think this can be closed here.
My ocrd installation is a few weeks old... found this error on (at least) 3 images. Is this a problem with a specific tensorflow version?