Open jbarth-ubhd opened 3 years ago
This is likely due to a combination of issues that have been fixed in https://github.com/qurator-spk/sbb_binarization/pull/11 resp. https://github.com/OCR-D/core/pull/633 - can you try again with sbb_binarization and core updated, please?
Nope:
16:42:21.774 INFO ocrd.task_sequence.run_tasks - Start processing task 'sbb-binarize -I OCR-D-IMG -O OCR-D-N1 -p '{"model": "/usr/local/ocrd_models/sbb/binarization/models", "operation_level": "page"}''
16:58:28.098 INFO ocrd.task_sequence.run_tasks - Finished processing task 'sbb-binarize -I OCR-D-IMG -O OCR-D-N1 -p '{"model": "/usr/local/ocrd_models/sbb/binarization/models", "operation_level": "page"}''
16:58:28.100 INFO ocrd.task_sequence.run_tasks - Start processing task 'anybaseocr-crop -I OCR-D-N1 -O OCR-D-N2 -p '{"force": true, "colSeparator": 0.04, "maxRularArea": 0.3, "minArea": 0.05, "minRularArea": 0.01, "positionBelow": 0.75, "positionLeft": 0.4, "positionRight": 0.6, "rularRatioMax": 10.0, "rularRatioMin": 3.0, "rularWidth": 0.95, "operation_level": "page"}''
Traceback (most recent call last):
File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/bin/ocrd", line 8, in <module>
sys.exit(cli())
File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/ocrd/cli/process.py", line 26, in process_cli
run_tasks(mets, log_level, page_id, tasks, overwrite)
File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/ocrd/task_sequence.py", line 149, in run_tasks
raise Exception("%s exited with non-zero return value %s. STDOUT:\n%s\nSTDERR:\n%s" % (task.executable, returncode, out, err))
Exception: ocrd-anybaseocr-crop exited with non-zero return value 1. STDOUT:
STDERR:
16:58:29.882 INFO OcrdAnybaseocrCropper - INPUT FILE 0 / P_00001
Traceback (most recent call last):
File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/bin/ocrd-anybaseocr-crop", line 8, in <module>
sys.exit(cli())
File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_cropping.py", line 527, in cli
return ocrd_cli_wrap_processor(OcrdAnybaseocrCropper, *args, **kwargs)
File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-packages/ocrd/decorators/__init__.py", line 81, in ocrd_cli_wrap_processor
run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-packages/ocrd/processor/helpers.py", line 69, in run_processor
processor.process()
File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_cropping.py", line 448, in process
feature_selector='binarized') # should also be deskewed
File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-packages/ocrd/workspace.py", line 420, in image_from_page
for feature in feature_selector.split(',') if feature) and
File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-packages/ocrd/workspace.py", line 420, in <genexpr>
for feature in feature_selector.split(',') if feature) and
TypeError: argument of type 'NoneType' is not iterable
Command exited with non-zero status 1
15206.61user 1637.87system 16:28.55elapsed 1703%CPU (0avgtext+0avgdata 11868964maxresident)k
1053272inputs+1560outputs (1800major+47188701minor)pagefaults 0swaps
workflow:
. /usr/local/ocrd_all/venv/bin/activate
export TMPDIR=/dwork/tmp
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
( ocrd-create-mets.xml
/usr/bin/time ocrd process \
"sbb-binarize -I OCR-D-IMG -O OCR-D-N1 -P model /usr/local/ocrd_models/sbb/binarization/models" \
"anybaseocr-crop -I OCR-D-N1 -O OCR-D-N2" \
"cis-ocropy-denoise -I OCR-D-N2 -O OCR-D-N4 -P level-of-operation page" \
"cis-ocropy-deskew -I OCR-D-N4 -O OCR-D-N5 -P level-of-operation page" \
"sbb-textline-detector -I OCR-D-N5 -O OCR-D-N6 -P model /usr/local/ocrd_models/sbb/textline" \
"cis-ocropy-clip -I OCR-D-N6 -O OCR-D-N7 -P level-of-operation region" \
"cis-ocropy-deskew -I OCR-D-N7 -O OCR-D-N8 -P level-of-operation region" \
"cis-ocropy-resegment -I OCR-D-N8 -O OCR-D-N9" \
"cis-ocropy-dewarp -I OCR-D-N9 -O OCR-D-N10" \
"calamari-recognize -I OCR-D-N10 -O OCR-D-OCR -P checkpoint /usr/local/ocrd_models/calamari/GT4HistOCR/*.ckpt.json"
) >cmd.log 2>&1
cave: N2...N4. second binarization should not be necessary with sbb-binarize(?)
The line numbers in the stacktrace look suspicious. Are you sure core is up-to-date?
What's the output of
(source /dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/bin/activate; ocrd --version)
Did make all
again, now it's
.../ocrd_all> (source /dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/bin/activate; ocrd --version)
ocrd, version 2.18.1
I'll try it again...
17:30:48.526 INFO ocrd.task_sequence.run_tasks - Start processing task 'sbb-binarize -I OCR-D-IMG
-O OCR-D-N1 -p '{"model": "/usr/local/ocrd_models/sbb/binarization/models", "operation_level":
"page"}''
17:46:02.399 INFO ocrd.task_sequence.run_tasks - Finished processing task 'sbb-binarize -I
OCR-D-IMG -O OCR-D-N1 -p '{"model": "/usr/local/ocrd_models/sbb/binarization/models",
"operation_level": "page"}''
17:46:02.401 INFO ocrd.task_sequence.run_tasks - Start processing task 'anybaseocr-crop -I OCR-D-N1
-O OCR-D-N2 -p '{"force": true, "colSeparator": 0.04, "maxRularArea": 0.3, "minArea": 0.05,
"minRularArea": 0.01, "positionBelow": 0.75, "positionLeft": 0.4, "positionRight": 0.6,
"rularRatioMax": 10.0, "rularRatioMin": 3.0, "rularWidth": 0.95, "operation_level": "page"}''
Traceback (most recent call last):
File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/bin/ocrd", line 8, in <module>
sys.exit(cli())
File
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py",
line 829, in __call__
return self.main(*args, **kwargs)
File
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py",
line 782, in main
rv = self.invoke(ctx)
File
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py",
line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py",
line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py",
line 610, in invoke
return callback(*args, **kwargs)
File
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/ocrd/cli/proces
s.py", line 26, in process_cli
run_tasks(mets, log_level, page_id, tasks, overwrite)
File
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/ocrd/task_seque
nce.py", line 149, in run_tasks
raise Exception("%s exited with non-zero return value %s. STDOUT:\n%s\nSTDERR:\n%s" %
(task.executable, returncode, out, err))
Exception: ocrd-anybaseocr-crop exited with non-zero return value 1. STDOUT:
STDERR:
17:46:03.008 INFO OcrdAnybaseocrCropper - INPUT FILE 0 / P_00001
Traceback (most recent call last):
File
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/bin/ocrd-anyba
seocr-crop", line 8, in <module>
sys.exit(cli())
File
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/
site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/
site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/
site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/
site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/
site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_cropping.py", line 527, in cli
return ocrd_cli_wrap_processor(OcrdAnybaseocrCropper, *args, **kwargs)
File
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/
site-packages/ocrd/decorators/__init__.py", line 81, in ocrd_cli_wrap_processor
run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
File
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/
site-packages/ocrd/processor/helpers.py", line 69, in run_processor
processor.process()
File
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/
site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_cropping.py", line 448, in process
feature_selector='binarized') # should also be deskewed
File
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/
site-packages/ocrd/workspace.py", line 420, in image_from_page
for feature in feature_selector.split(',') if feature) and
File
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/
site-packages/ocrd/workspace.py", line 420, in <genexpr>
for feature in feature_selector.split(',') if feature) and
TypeError: argument of type 'NoneType' is not iterable
Command exited with non-zero status 1
15405.11user 1581.59system 15:27.10elapsed 1832%CPU (0avgtext+0avgdata 11925248maxresident)k
7520inputs+1576outputs (93major+45448743minor)pagefaults 0swaps
The fix to core to handle AlternativeImage without comments is only in 2.19.0 which isn't yet in ocrd_all. I'll send a PR later.
But I do not understand why sbb_binarization still seems to produce AlternativeImage without comments - can you verify that this is the case? I.e. how do the pg:Page
elements begin in a PAGE-XML in OCR-D-N1
?
did core> git pull https://github.com/OCR-D/core
now ...
did
core> git pull https://github.com/OCR-D/core
now ...
This is merely a workaround in core, though, the real issue remains why sbb_binarization does not produce comments
.
still the same problem.
@jbarth-ubhd has this since been resolved?
Perhaps a problem only in combination with ocrd-sbb-binarize(?)