Closed MehmedGIT closed 1 year ago
Cannot reproduce based on this info. For a start, I need the concrete workflow – not just the latest processor call. Obviously, some previous annotation has produced derived images which pillow cannot handle. Please check (e.g. identify -verbose
) the images in the input fileGrp if you can.
Part of which ocrd command/processor is identify -verbose
?
I've used this workflow: https://ocr-d.de/en/workflows#example-with-ocrd-process-2
Part of which ocrd command/processor is
identify -verbose
?
ImageMagick CLIs.
I've used this workflow: https://ocr-d.de/en/workflows#example-with-ocrd-process-2
Then I guess the problem is ocrd-anybaseocr-crop's use of transparency=True
.
So Pillow gets a binarized image with transparency, i.e. mode=LA. Then ocrd-skimage-denoise converts to Numpy, i.e. dtype=uint8 (but still 2-channel). Now scikit-image's remove_small_holes converts to bool (while keeping 2-channel). This now cannot be converted back to Pillow via array interface.
Something must have changed upstream – this definitely used to work...
@MehmedGIT can you please try with #10? (It works for me, but I am still not 100% sure about your scenario.)
@bertsky, I have taken the data from previous steps and executed ocrd-skimage-denoise
as a single step on top. Seems to work now (no errors). However, the output images are empty (transparent) - not sure if that's expected. Here is the workspace zip with the results: https://owncloud.gwdg.de/index.php/s/LZliizgilc8Dffd
Oops, I'm sorry – should have looked at the actual image. I forgot to scale back to the max()
as well. If you pull again, it should now produce something useful.
Yes, it works now!
Fixed and part of ocrd/all:2023-06-18.
I am using the latest ocrd_all maximum image. Workspace used: https://gdz.sub.uni-goettingen.de/mets/PPN1023134829.mets.xml