bertsky / ocrd_wrap

OCR-D wrapper for arbitrary coords-preserving image operations
MIT License
4 stars 1 forks source link

Wrong data type in PIL/Image.py #9

Closed MehmedGIT closed 1 year ago

MehmedGIT commented 1 year ago

I am using the latest ocrd_all maximum image. Workspace used: https://gdz.sub.uni-goettingen.de/mets/PPN1023134829.mets.xml

N E X T F L O W  ~  version 21.04.3
Launching `/scratch1/users/mmustaf/operandi/slurm_workspaces/a7752ccc-3908-4d6a-917c-036cf9ffef6c/user_workflow.nf` [hopeful_plateau] - revision: 4d3b00d56e
O P E R A N D I - H P C - D E F A U L T  P I P E L I N E
===========================================
input_file_group    : MAX
mets                : /scratch1/users/mmustaf/operandi/slurm_workspaces/a7752ccc-3908-4d6a-917c-036cf9ffef6c/7ed688de-482d-439a-816f-75b2226c60db/mets.xml
volume_map_dir      : /scratch1/users/mmustaf/operandi/slurm_workspaces/a7752ccc-3908-4d6a-917c-036cf9ffef6c
models_mapping      : /scratch1/users/mmustaf/ocrd_models:/usr/local/share
sif_path            : /scratch1/users/mmustaf/ocrd_all_maximum_image.sif
singularity_wrapper : singularity exec --bind /scratch1/users/mmustaf/operandi/slurm_workspaces/a7752ccc-3908-4d6a-917c-036cf9ffef6c --bind /scratch1/users/mmustaf/ocrd_models:/usr/local/share --env OCRD_METS_CACHING=true /scratch1/users/mmustaf/ocrd_all_maximum_image.sif

[2e/fe535b] Submitted process > ocrd_cis_ocropy_binarize
[66/3a142e] Submitted process > ocrd_anybaseocr_crop
[fe/6339fd] Submitted process > ocrd_skimage_denoise
Error executing process > 'ocrd_skimage_denoise'

Caused by:
  Process `ocrd_skimage_denoise` terminated with an error exit status (1)

Command executed:

  singularity exec --bind /scratch1/users/mmustaf/operandi/slurm_workspaces/a7752ccc-3908-4d6a-917c-036cf9ffef6c --bind /scratch1/users/mmustaf/ocrd_models:/usr/local/share --env OCRD_METS_CACHING=true /scratch1/users/mmustaf/ocrd_all_maximum_image.sif ocrd-skimage-denoise -m mets.xml -I OCR-D-CROP -O OCR-D-BIN-DENOISE -p '{"level-of-operation": "page"}'

Command exit status:
  1

Command output:
  (empty)

Command error:
  Traceback (most recent call last):
    File "/usr/local/lib/python3.8/dist-packages/PIL/Image.py", line 3089, in fromarray
      mode, rawmode = _fromarray_typemap[typekey]
  KeyError: ((1, 1, 2), '|b1')

  The above exception was the direct cause of the following exception:

  Traceback (most recent call last):
    File "/build/core/ocrd/ocrd/processor/helpers.py", line 128, in run_processor
      processor.process()
    File "/usr/local/lib/python3.8/site-packages/ocrd_wrap/skimage_denoise.py", line 90, in process
      self._process_segment(page, page_image, page_coords, dpi,
    File "/usr/local/lib/python3.8/site-packages/ocrd_wrap/skimage_denoise.py", line 166, in _process_segment
      image = Image.fromarray(~array2)
    File "/usr/local/lib/python3.8/dist-packages/PIL/Image.py", line 3092, in fromarray
      raise TypeError(msg) from e
  TypeError: Cannot handle this data type: (1, 1, 2), |b1
  Traceback (most recent call last):
    File "/usr/local/lib/python3.8/dist-packages/PIL/Image.py", line 3089, in fromarray
      mode, rawmode = _fromarray_typemap[typekey]
  KeyError: ((1, 1, 2), '|b1')

  The above exception was the direct cause of the following exception:

  Traceback (most recent call last):
    File "/usr/local/bin/ocrd-skimage-denoise", line 8, in <module>
      sys.exit(ocrd_skimage_denoise())
    File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1130, in __call__
      return self.main(*args, **kwargs)
    File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1055, in main
      rv = self.invoke(ctx)
    File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1404, in invoke
      return ctx.invoke(self.callback, **ctx.params)
    File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 760, in invoke
      return __callback(*args, **kwargs)
    File "/usr/local/lib/python3.8/site-packages/ocrd_wrap/cli.py", line 33, in ocrd_skimage_denoise
      return ocrd_cli_wrap_processor(SkimageDenoise, *args, **kwargs)
    File "/build/core/ocrd/ocrd/decorators/__init__.py", line 116, in ocrd_cli_wrap_processor
      run_processor(processorClass, mets_url=mets, workspace=workspace, **kwargs)
    File "/build/core/ocrd/ocrd/processor/helpers.py", line 131, in run_processor
      raise err
    File "/build/core/ocrd/ocrd/processor/helpers.py", line 128, in run_processor
      processor.process()
    File "/usr/local/lib/python3.8/site-packages/ocrd_wrap/skimage_denoise.py", line 90, in process
      self._process_segment(page, page_image, page_coords, dpi,
    File "/usr/local/lib/python3.8/site-packages/ocrd_wrap/skimage_denoise.py", line 166, in _process_segment
      image = Image.fromarray(~array2)
    File "/usr/local/lib/python3.8/dist-packages/PIL/Image.py", line 3092, in fromarray
      raise TypeError(msg) from e
  TypeError: Cannot handle this data type: (1, 1, 2), |b1
bertsky commented 1 year ago

Cannot reproduce based on this info. For a start, I need the concrete workflow – not just the latest processor call. Obviously, some previous annotation has produced derived images which pillow cannot handle. Please check (e.g. identify -verbose) the images in the input fileGrp if you can.

MehmedGIT commented 1 year ago

Part of which ocrd command/processor is identify -verbose? I've used this workflow: https://ocr-d.de/en/workflows#example-with-ocrd-process-2

bertsky commented 1 year ago

Part of which ocrd command/processor is identify -verbose?

ImageMagick CLIs.

I've used this workflow: https://ocr-d.de/en/workflows#example-with-ocrd-process-2

Then I guess the problem is ocrd-anybaseocr-crop's use of transparency=True.

So Pillow gets a binarized image with transparency, i.e. mode=LA. Then ocrd-skimage-denoise converts to Numpy, i.e. dtype=uint8 (but still 2-channel). Now scikit-image's remove_small_holes converts to bool (while keeping 2-channel). This now cannot be converted back to Pillow via array interface.

Something must have changed upstream – this definitely used to work...

bertsky commented 1 year ago

@MehmedGIT can you please try with #10? (It works for me, but I am still not 100% sure about your scenario.)

MehmedGIT commented 1 year ago

@bertsky, I have taken the data from previous steps and executed ocrd-skimage-denoise as a single step on top. Seems to work now (no errors). However, the output images are empty (transparent) - not sure if that's expected. Here is the workspace zip with the results: https://owncloud.gwdg.de/index.php/s/LZliizgilc8Dffd

bertsky commented 1 year ago

Oops, I'm sorry – should have looked at the actual image. I forgot to scale back to the max() as well. If you pull again, it should now produce something useful.

MehmedGIT commented 1 year ago

Yes, it works now!

MehmedGIT commented 1 year ago

Fixed and part of ocrd/all:2023-06-18.