cisocrgroup / ocrd_cis

OCR-D python tools
MIT License
33 stars 12 forks source link

Bug: OcropyClip: TypeError: function takes exactly 1 argument (2 given) #72

Closed jbarth-ubhd closed 3 years ago

jbarth-ubhd commented 3 years ago

Workflow:

. /usr/local/ocrd_all/venv/bin/activate
export TMPDIR=/dwork/tmp
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
ocrd-create-mets.xml
( /usr/bin/time ocrd process \
"olena-binarize -I OCR-D-IMG -O OCR-D-N1 -P impl wolf" \
"anybaseocr-crop -I OCR-D-N1 -O OCR-D-N2" \
"olena-binarize -I OCR-D-N2 -O OCR-D-N3 -P impl wolf" \
"cis-ocropy-denoise -I OCR-D-N3 -O OCR-D-N4 -P level-of-operation page" \
"cis-ocropy-deskew -I OCR-D-N4 -O OCR-D-N5 -P level-of-operation page" \
"tesserocr-segment-region -I OCR-D-N5 -O OCR-D-N6" \
"segment-repair -I OCR-D-N6 -O OCR-D-N7 -P plausibilize true" \
"cis-ocropy-deskew -I OCR-D-N7 -O OCR-D-N8 -P level-of-operation region" \
"cis-ocropy-clip -I OCR-D-N8 -O OCR-D-N9 -P level-of-operation region" \
"tesserocr-segment-line -I OCR-D-N9 -O OCR-D-N10" \
"cis-ocropy-clip -I OCR-D-N10 -O OCR-D-N11 -P level-of-operation line" \
"cis-ocropy-resegment -I OCR-D-N11 -O OCR-D-N12" \
"cis-ocropy-dewarp -I OCR-D-N12 -O OCR-D-N13" \
"calamari-recognize -I OCR-D-N13 -O OCR-D-OCR -P checkpoint /usr/local/ocrd_models/calamari/calamari_models-0.3/fraktur_historical/*.ckpt.json"

) >cmd.log 2>&1

Log:

02:57:04.632 INFO ocrd.task_sequence.run_tasks - Start processing task 'cis-ocropy-clip -I OCR-D-N8 -O OCR-D-N9 -p '{"level-of-op
eration": "region", "dpi": 0, "min_fraction": 0.7}''
Traceback (most recent call last):
  File "/usr/local/ocrd_all/venv/bin/ocrd", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/ocrd_all/venv/lib/python3.7/site-packages/ocrd/cli/process.py", line 26, in pro
cess_cli
    run_tasks(mets, log_level, page_id, tasks, overwrite)
  File "/usr/local/ocrd_all/venv/lib/python3.7/site-packages/ocrd/task_sequence.py", line 149, in 
run_tasks
    raise Exception("%s exited with non-zero return value %s. STDOUT:\n%s\nSTDERR:\n%s" % (task.executable, returncode, out, err))
Exception: ocrd-cis-ocropy-clip exited with non-zero return value 1. STDOUT:

STDERR:
02:57:06.605 INFO processor.OcropyClip - INPUT FILE 0 / P_00001
02:57:07.682 INFO processor.OcropyClip - Page "OCR-D-N8_00001" uses 300.000000 DPI
Traceback (most recent call last):
  File "/usr/local/ocrd_all/venv/bin/ocrd-cis-ocropy-clip", line 8, in <module>
    sys.exit(ocrd_cis_ocropy_clip())
  File "/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/ocrd_all/venv/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/ocrd_all/venv/lib/python3.7/site-packages/ocrd_cis/ocropy/cli.py", line 33, in ocrd_cis_ocropy_clip
    return ocrd_cli_wrap_processor(OcropyClip, *args, **kwargs)
  File "/usr/local/ocrd_all/venv/lib/python3.7/site-packages/ocrd/decorators/__init__.py", line 81, in ocrd_cli_wrap_processor
    run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
  File "/usr/local/ocrd_all/venv/lib/python3.7/site-packages/ocrd/processor/helpers.py", line 68, in run_processor
    processor.process()
  File "/usr/local/ocrd_all/venv/lib/python3.7/site-packages/ocrd_cis/ocropy/clip.py", line 131, in process
    background_image = Image.new('L', page_image.size, background)
  File "/usr/local/ocrd_all/venv/lib/python3.7/site-packages/PIL/Image.py", line 2613, in new
    return im._new(core.fill(mode, size, color))
TypeError: function takes exactly 1 argument (2 given)

Command exited with non-zero status 1
bertsky commented 3 years ago

Thanks for the report!

Looks like this is related to this SO issue around this GH issue of Pillow. Which version do you have? (pip show pillow)

bertsky commented 3 years ago

BTW your workflow has a flaw: You cannot use clipping after any processor that already adds derived images (AlternativeImage) on the same hierarchy level. Since...

https://github.com/cisocrgroup/ocrd_cis/blob/a6f90abc43faabf86691d6d689b4ad4470a6c107/ocrd_cis/ocropy/clip.py#L77-L79

...and so the processor won't actually do anything but print warnings...

https://github.com/cisocrgroup/ocrd_cis/blob/a6f90abc43faabf86691d6d689b4ad4470a6c107/ocrd_cis/ocropy/clip.py#L152-L155

I have updated the workflow guide wiki page to better reflect this. (I distinctly remember documenting this elsewhere already, though...)

EDIT So I would recommend exchanging the order with the region-level deskewing. Also, you don't need clipping on the line level if you already apply resegmentation. The latter can be seen as an alternative method (based on coordinates instead of image data). For full clarification about both operations including screenshots, I recommend these slides (and follow-up).

jbarth-ubhd commented 3 years ago

Thanks for the report!

Looks like this is related to this SO issue around this GH issue of Pillow. Which version do you have? (pip show pillow)

(venv) jb@pers109:/usr/local/ocrd_all> pip show pillow
Name: Pillow
Version: 7.2.0
jbarth-ubhd commented 3 years ago

Workflow has been generated by script according to https://ocr-d.de/en/workflows (as of ~ 2020-06).

jbarth-ubhd commented 3 years ago

You cannot use clipping after any processor that already adds derived images

this two lines?

cis-ocropy-deskew -I OCR-D-N7 -O OCR-D-N8 -P level-of-operation region" \
cis-ocropy-clip ...

But it's still here: https://ocr-d.de/en/workflows Step 9 + 10

bertsky commented 3 years ago

You cannot use clipping after any processor that already adds derived images

this two lines?

cis-ocropy-deskew -I OCR-D-N7 -O OCR-D-N8 -P level-of-operation region" \
cis-ocropy-clip ...

But it's still here: https://ocr-d.de/en/workflows Step 9 + 10

Yes, sorry, did not spot it there the first time. I have updated my issue on the recommendations.

bertsky commented 3 years ago

Thanks for the report! Looks like this is related to this SO issue around this GH issue of Pillow. Which version do you have? (pip show pillow)

(venv) jb@pers109:/usr/local/ocrd_all> pip show pillow
Name: Pillow
Version: 7.2.0

7.2.0 should have the fix IIUC. I am still trying to reproduce. It definitely depends on the image's color mode (16/32 bit integer or float are probably the culprit). Getting back to you when I tried with such an image.

bertsky commented 3 years ago

I am still trying to reproduce. It definitely depends on the image's color mode (16/32 bit integer or float are probably the culprit). Getting back to you when I tried with such an image.

No luck here. Tried with 8+1, with 16 bit and 32 bit images. Could you please share the image it failed on?

jbarth-ubhd commented 3 years ago

Sent an email with link to images.

bertsky commented 3 years ago

Thanks – I could reproduce. You helped find a bug again! Fix is in #75