cgohlke / imagecodecs

Image transformation, compression, and decompression codecs
https://pypi.org/project/imagecodecs
BSD 3-Clause "New" or "Revised" License
111 stars 21 forks source link

Decode PackBits with padded bytes. #86

Closed erikogabrielsson closed 7 months ago

erikogabrielsson commented 7 months ago

Hi @cgohlke and thanks for this excellent library.

Im using the packbits decoder to decode PackBits encoded by the pylibjpeg-rle encoder. This encoder produces bytes according to the DICOM standard, where the result "must be an even number of bytes or padded at its end with zero to make it even". For a 2x2 random array ([121, 121, 27, 63]) I can thus get:

b'\xffy\x01\x1b?\x00'

While the imagecodecs encoder produces:

b'\xffy\x01\x1b?'

When decoding, the imagecodecs decoder does not like the extra byte, throwing ImcdError:

File imagecodecs\_imcd.pyx:844, in imagecodecs._imcd.packbits_decode()

ImcdError: imcd_packbits_decode returned IMCD_INPUT_CORRUPT

Which looks to come from this check.

From what I understand, the extra bit could be interpreted as a replicate segment of length 0. Im not sure if this is according to the PackBits specifications?

cgohlke commented 7 months ago

I think the pad byte is not part of the compressed PackBits stream. As the PackBits standard says: "you need to know either the length of the packed or unpacked data". Hence the options are to omit the pad bytes or specify the correct output length when calling packbits_decode. On the other hand, it should be easy to ignore the pad byte in the code...

cgohlke commented 7 months ago

The single pad byte added by DICOM encoders will be ignored in the next version of imagecodecs.

erikogabrielsson commented 7 months ago

Thanks!

cgohlke commented 6 months ago

Fixed in v2024.1.1. That release also includes a DICOMRLE decoder. It has not been tested on real DICOM files yet...

erikogabrielsson commented 6 months ago

Thanks @cgohlke,

My decode implementation (for wsidicom) using packbits_decode now produces the same results as pylibjpeg-rle. Also your new DICOMRLE decoder produces the same output. Your DICOMRLE decoder is significantly faster for 16 bit data compared to my packbit_decode-based implementation, and about equal in performance to pylibjpeg-rle. So I will use DICOMRLE decoder in my implementation. :)

I have only tested this on frames encoded by packbits_encode and pylibjpeg-rle, so no real DICOM files yet.