cgohlke / czifile

Read Carl Zeiss(r) Image (CZI) files
https://pypi.org/project/czifile
Other
27 stars 8 forks source link

Lack of zstd compression support #10

Open Czaki opened 4 months ago

Czaki commented 4 months ago

It looks like CZI file format was updated to support compressing data using zstd that is no supported by this package yet.

based on my investigation they added two modes. 5: zstd0 (plain zstd, but do not have such file to test) https://github.com/ZEISS/libczi/blob/4a60e22200cbf0c8ff2a59f69a81ef1b2b89bf4f/Src/libCZI/decoder_zstd.cpp#L39

And mode 6 named zstd1

https://github.com/ZEISS/libczi/blob/4a60e22200cbf0c8ff2a59f69a81ef1b2b89bf4f/Src/libCZI/decoder_zstd.cpp#L39

So it seems like to support mode 5 it will be enough to add

DECOMPRESS[5] = imagecodecs.zstd_decode

to this dict

https://github.com/cgohlke/czifile/blob/a70265fd430983875bf4c31955f2ad57f2592747/czifile/czifile.py#L1227-L1231

Mode 6 requires to add parsing header and decoding that may be done with such a code:

class ZSTD1Header(typing.NamedTuple):
    """
    ZSTD1 header structure
    based on:
    https://github.com/ZEISS/libczi/blob/4a60e22200cbf0c8ff2a59f69a81ef1b2b89bf4f/Src/libCZI/decoder_zstd.cpp#L19
    """

    header_size: int
    hiLoByteUnpackPreprocessing:bool

def parse_zstd1_header(data, size):
    """
    Parse ZSTD header

    https://github.com/ZEISS/libczi/blob/4a60e22200cbf0c8ff2a59f69a81ef1b2b89bf4f/Src/libCZI/decoder_zstd.cpp#L84
    """
    if size < 1:
        return ZSTD1Header(0, False)

    if data[0] == 1:
        return ZSTD1Header(1, False)

    if data[0] == 3 and size < 3:
        return ZSTD1Header(0, False)

    if data[1] == 1:
        return ZSTD1Header(3, bool(data[2] & 1))

    return ZSTD1Header(0, False)

def decode_zstd1(data):
    """
    Decode ZSTD1 data
    """
    header = parse_zstd1_header(data, len(data))
    return imagecodecs.zstd_decode(data[header.header_size:])

(this code does not support hilo byte unpacking, I do not dig it yet).

Hovewer provide own decode function is not enough, as zstd_decode return bytes, not array.

So it require update also this line: https://github.com/cgohlke/czifile/blob/a70265fd430983875bf4c31955f2ad57f2592747/czifile/czifile.py#L660

to

if de.compression in {2, 5, 6}:

The current example file that I had is 500mb so require an alternative way to share. I try to obtain a smaller one.

Did you prefer to do this on your site or PR?