OCR-D / ocrd_anybaseocr

DFKI Layout Detection for OCR-D
Apache License 2.0
48 stars 12 forks source link

ocrd-anybaseocr-crop: TypeError: argument of type 'NoneType' is not iterable #74

Open jbarth-ubhd opened 3 years ago

jbarth-ubhd commented 3 years ago

Perhaps a problem only in combination with ocrd-sbb-binarize(?)


(venv) jb@pers109:~/literatur_schoenen_wissenschaften1780a> ocrd-anybaseocr-crop
 -I OCR-D-BIN -O OCR-D-CROP
16:04:18.388 INFO OcrdAnybaseocrCropper - INPUT FILE 0 / P_00001
Traceback (most recent call last):
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/bin/ocrd-anybaseocr-crop",
 line 8, in <module>
    sys.exit(cli())
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-package
s/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-package
s/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-package
s/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-package
s/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-package
s/ocrd_anybaseocr/cli/ocrd_anybaseocr_cropping.py", line 527, in cli
    return ocrd_cli_wrap_processor(OcrdAnybaseocrCropper, *args, **kwargs)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-package
s/ocrd/decorators/__init__.py", line 81, in ocrd_cli_wrap_processor
    run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-package
s/ocrd/processor/helpers.py", line 69, in run_processor
    processor.process()
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-package
s/ocrd_anybaseocr/cli/ocrd_anybaseocr_cropping.py", line 448, in process
    feature_selector='binarized') # should also be deskewed
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-package
s/ocrd/workspace.py", line 420, in image_from_page
    for feature in feature_selector.split(',') if feature) and
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-package
s/ocrd/workspace.py", line 420, in <genexpr>
    for feature in feature_selector.split(',') if feature) and
TypeError: argument of type 'NoneType' is not iterable
kba commented 3 years ago

This is likely due to a combination of issues that have been fixed in https://github.com/qurator-spk/sbb_binarization/pull/11 resp. https://github.com/OCR-D/core/pull/633 - can you try again with sbb_binarization and core updated, please?

jbarth-ubhd commented 3 years ago

Nope:

16:42:21.774 INFO ocrd.task_sequence.run_tasks - Start processing task 'sbb-binarize -I OCR-D-IMG -O OCR-D-N1 -p '{"model": "/usr/local/ocrd_models/sbb/binarization/models", "operation_level": "page"}''
16:58:28.098 INFO ocrd.task_sequence.run_tasks - Finished processing task 'sbb-binarize -I OCR-D-IMG -O OCR-D-N1 -p '{"model": "/usr/local/ocrd_models/sbb/binarization/models", "operation_level": "page"}''
16:58:28.100 INFO ocrd.task_sequence.run_tasks - Start processing task 'anybaseocr-crop -I OCR-D-N1 -O OCR-D-N2 -p '{"force": true, "colSeparator": 0.04, "maxRularArea": 0.3, "minArea": 0.05, "minRularArea": 0.01, "positionBelow": 0.75, "positionLeft": 0.4, "positionRight": 0.6, "rularRatioMax": 10.0, "rularRatioMin": 3.0, "rularWidth": 0.95, "operation_level": "page"}''
Traceback (most recent call last):
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/bin/ocrd", line 8, in <module>
    sys.exit(cli())
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/ocrd/cli/process.py", line 26, in process_cli
    run_tasks(mets, log_level, page_id, tasks, overwrite)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/ocrd/task_sequence.py", line 149, in run_tasks
    raise Exception("%s exited with non-zero return value %s. STDOUT:\n%s\nSTDERR:\n%s" % (task.executable, returncode, out, err))
Exception: ocrd-anybaseocr-crop exited with non-zero return value 1. STDOUT:

STDERR:
16:58:29.882 INFO OcrdAnybaseocrCropper - INPUT FILE 0 / P_00001
Traceback (most recent call last):
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/bin/ocrd-anybaseocr-crop", line 8, in <module>
    sys.exit(cli())
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_cropping.py", line 527, in cli
    return ocrd_cli_wrap_processor(OcrdAnybaseocrCropper, *args, **kwargs)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-packages/ocrd/decorators/__init__.py", line 81, in ocrd_cli_wrap_processor
    run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-packages/ocrd/processor/helpers.py", line 69, in run_processor
    processor.process()
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_cropping.py", line 448, in process
    feature_selector='binarized') # should also be deskewed
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-packages/ocrd/workspace.py", line 420, in image_from_page
    for feature in feature_selector.split(',') if feature) and
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/site-packages/ocrd/workspace.py", line 420, in <genexpr>
    for feature in feature_selector.split(',') if feature) and
TypeError: argument of type 'NoneType' is not iterable

Command exited with non-zero status 1
15206.61user 1637.87system 16:28.55elapsed 1703%CPU (0avgtext+0avgdata 11868964maxresident)k
1053272inputs+1560outputs (1800major+47188701minor)pagefaults 0swaps
jbarth-ubhd commented 3 years ago

workflow:

. /usr/local/ocrd_all/venv/bin/activate
export TMPDIR=/dwork/tmp
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
( ocrd-create-mets.xml
/usr/bin/time ocrd process \
"sbb-binarize -I OCR-D-IMG -O OCR-D-N1 -P model /usr/local/ocrd_models/sbb/binarization/models" \
"anybaseocr-crop -I OCR-D-N1 -O OCR-D-N2" \
"cis-ocropy-denoise -I OCR-D-N2 -O OCR-D-N4 -P level-of-operation page" \
"cis-ocropy-deskew -I OCR-D-N4 -O OCR-D-N5 -P level-of-operation page" \
"sbb-textline-detector -I OCR-D-N5 -O OCR-D-N6 -P model /usr/local/ocrd_models/sbb/textline" \
"cis-ocropy-clip -I OCR-D-N6 -O OCR-D-N7 -P level-of-operation region" \
"cis-ocropy-deskew -I OCR-D-N7 -O OCR-D-N8 -P level-of-operation region" \
"cis-ocropy-resegment -I OCR-D-N8 -O OCR-D-N9" \
"cis-ocropy-dewarp -I OCR-D-N9 -O OCR-D-N10" \
"calamari-recognize -I OCR-D-N10 -O OCR-D-OCR -P checkpoint /usr/local/ocrd_models/calamari/GT4HistOCR/*.ckpt.json"
) >cmd.log 2>&1
jbarth-ubhd commented 3 years ago

cave: N2...N4. second binarization should not be necessary with sbb-binarize(?)

kba commented 3 years ago

The line numbers in the stacktrace look suspicious. Are you sure core is up-to-date?

What's the output of

(source /dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/bin/activate; ocrd --version)
jbarth-ubhd commented 3 years ago

Did make all again, now it's

.../ocrd_all> (source /dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/bin/activate; ocrd --version)
ocrd, version 2.18.1

I'll try it again...

jbarth-ubhd commented 3 years ago
17:30:48.526 INFO ocrd.task_sequence.run_tasks - Start processing task 'sbb-binarize -I OCR-D-IMG 
-O OCR-D-N1 -p '{"model": "/usr/local/ocrd_models/sbb/binarization/models", "operation_level": 
"page"}''
17:46:02.399 INFO ocrd.task_sequence.run_tasks - Finished processing task 'sbb-binarize -I 
OCR-D-IMG -O OCR-D-N1 -p '{"model": "/usr/local/ocrd_models/sbb/binarization/models", 
"operation_level": "page"}''
17:46:02.401 INFO ocrd.task_sequence.run_tasks - Start processing task 'anybaseocr-crop -I OCR-D-N1 
-O OCR-D-N2 -p '{"force": true, "colSeparator": 0.04, "maxRularArea": 0.3, "minArea": 0.05, 
"minRularArea": 0.01, "positionBelow": 0.75, "positionLeft": 0.4, "positionRight": 0.6, 
"rularRatioMax": 10.0, "rularRatioMin": 3.0, "rularWidth": 0.95, "operation_level": "page"}''
Traceback (most recent call last):
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/bin/ocrd", line 8, in <module>
    sys.exit(cli())
  File 
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py",
 line 829, in __call__
    return self.main(*args, **kwargs)
  File 
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py",
 line 782, in main
    rv = self.invoke(ctx)
  File 
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py",
 line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File 
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py",
 line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File 
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py",
 line 610, in invoke
    return callback(*args, **kwargs)
  File 
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/ocrd/cli/proces
s.py", line 26, in process_cli
    run_tasks(mets, log_level, page_id, tasks, overwrite)
  File 
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.7/site-packages/ocrd/task_seque
nce.py", line 149, in run_tasks
    raise Exception("%s exited with non-zero return value %s. STDOUT:\n%s\nSTDERR:\n%s" % 
(task.executable, returncode, out, err))
Exception: ocrd-anybaseocr-crop exited with non-zero return value 1. STDOUT:

STDERR:
17:46:03.008 INFO OcrdAnybaseocrCropper - INPUT FILE 0 / P_00001
Traceback (most recent call last):
  File 
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/bin/ocrd-anyba
seocr-crop", line 8, in <module>
    sys.exit(cli())
  File 
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/
site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File 
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/
site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File 
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/
site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File 
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/
site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File 
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/
site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_cropping.py", line 527, in cli
    return ocrd_cli_wrap_processor(OcrdAnybaseocrCropper, *args, **kwargs)
  File 
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/
site-packages/ocrd/decorators/__init__.py", line 81, in ocrd_cli_wrap_processor
    run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
  File 
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/
site-packages/ocrd/processor/helpers.py", line 69, in run_processor
    processor.process()
  File 
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/
site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_cropping.py", line 448, in process
    feature_selector='binarized') # should also be deskewed
  File 
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/
site-packages/ocrd/workspace.py", line 420, in image_from_page
    for feature in feature_selector.split(',') if feature) and
  File 
"/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf21/lib/python3.7/
site-packages/ocrd/workspace.py", line 420, in <genexpr>
    for feature in feature_selector.split(',') if feature) and
TypeError: argument of type 'NoneType' is not iterable

Command exited with non-zero status 1
15405.11user 1581.59system 15:27.10elapsed 1832%CPU (0avgtext+0avgdata 11925248maxresident)k
7520inputs+1576outputs (93major+45448743minor)pagefaults 0swaps
kba commented 3 years ago

The fix to core to handle AlternativeImage without comments is only in 2.19.0 which isn't yet in ocrd_all. I'll send a PR later.

But I do not understand why sbb_binarization still seems to produce AlternativeImage without comments - can you verify that this is the case? I.e. how do the pg:Page elements begin in a PAGE-XML in OCR-D-N1?

jbarth-ubhd commented 3 years ago

did core> git pull https://github.com/OCR-D/core now ...

kba commented 3 years ago

did core> git pull https://github.com/OCR-D/core now ...

This is merely a workaround in core, though, the real issue remains why sbb_binarization does not produce comments.

jbarth-ubhd commented 3 years ago

still the same problem.

bertsky commented 3 years ago

@jbarth-ubhd has this since been resolved?