cgohlke / tifffile

Read and write TIFF files
https://pypi.org/project/tifffile
BSD 3-Clause "New" or "Revised" License
542 stars 155 forks source link

Providing the ability to specify zlib compressor #273

Open hmaarrfk opened 1 month ago

hmaarrfk commented 1 month ago

Going through the code, it seems that the compressor is pretty hard coded based on the compressiontag.

https://github.com/cgohlke/tifffile/blob/8a25a0d4738390af0a1f693705f29875d88fc320/tifffile/tifffile.py#L16775

This means that imagecodecs.deflate is preferentially used. Recently it seems that the zlib-ng library has made great strides and it would be great to start to experiment with it. However, I understand the dangers of just "swapping it out". I personally am not brave enough to do that, but maybe you are.

It would effectively amount to adding (not really tested at time of writing)

if (
                    hasattr(imagecodecs, 'ZLIBNG') and
                    imagecodecs.ZLIBNG.available
                ):
                    if self._encode:
                        codec = imagecodecs.zlibng_encode
                    else:
                        codec = imagecodecs.zlibng_decode
                elif (
                    hasattr(imagecodecs, 'DEFLATE')
                    and imagecodecs.DEFLATE.available
                ):

It might also be interesting to just provide the ability to switch between the two at runtime for benchmarking purposes. The larger patch below would ensure that:

This would eventually let us use compression as a key into the specific codec that the user wants. However, I understand this adds complexity and maintenance and maybe just forcing zlib-ng onto people is easier.

Let me know what you prefer or if you prefer an other option.

A larger patch for more flexibility ```patch diff --git a/tifffile/tifffile.py b/tifffile/tifffile.py index fa7d4c1..fa2efc7 100644 --- a/tifffile/tifffile.py +++ b/tifffile/tifffile.py @@ -1986,7 +1986,7 @@ class TiffWriter: factor. Segment lengths and rowsperstrip must be a multiple of 8 times the vertical factor. The values are written to the YCbCrSubSampling tag. - jpegtables: + jpegtables: JPEG quantization and/or Huffman tables. Use for copying pre-compressed JPEG segments. The value is written to the JPEGTables tag. @@ -2397,9 +2397,6 @@ class TiffWriter: int(enumarg(EXTRASAMPLE, x)) for x in sequence(extrasamples) ) - if compressionargs is None: - compressionargs = {} - if compression: if isinstance(compression, (tuple, list)): # TODO: unreachable @@ -2422,26 +2419,33 @@ class TiffWriter: compression = compression[0] if isinstance(compression, str): compression = compression.upper() - if compression == 'ZLIB': - compression = 8 # ADOBE_DEFLATE - elif isinstance(compression, bool): - compression = 8 # ADOBE_DEFLATE + # Boolean or other truthy object should be cast back to string + elif compression: + compression = 'ADOBE_DEFLATE' + else: + compression = 'NONE' compressiontag = enumarg(COMPRESSION, compression).value - compression = True else: compressiontag = 1 - compression = False - if compressiontag == 1: + if compressionargs is None: + compressionargs = {} + + if compressiontag == 1 and len(compressionargs) > 0: + warnings.warn( + "Passed 'compressionargs' to uncompressed image. " + "This will raise an error in the future.", + DeprecationWarning, + stacklevel=2 + ) compressionargs = {} elif compressiontag in {33003, 33004, 33005, 34712}: # JPEG2000: use J2K instead of JP2 - compressionargs['codecformat'] = 0 # OPJ_CODEC_J2K - - assert compressionargs is not None + if 'codecformat' not in compressionargs: + compressionargs['codecformat'] = 0 # OPJ_CODEC_J2K if predictor: - if not compression: + if compression == 'NONE': raise ValueError('cannot use predictor without compression') if compressiontag in TIFF.IMAGE_COMPRESSIONS: # don't use predictor with JPEG, JPEG2000, WEBP, PNG, ... @@ -2848,7 +2852,7 @@ class TiffWriter: and bitspersample <= datadtype.itemsize * 8 ): raise ValueError(f'{bitspersample=} out of range of {datadtype=}') - elif compression: + elif compression != 'NONE': if bitspersample != datadtype.itemsize * 8: raise ValueError( f'{bitspersample=} cannot be used with compression' @@ -3168,7 +3172,7 @@ class TiffWriter: addtag(tags, 296, 3, 1, resolutionunit) # ResolutionUnit # can save data array contiguous - contiguous = not (compression or packints or bilevel) + contiguous = not ((compression != 'NONE') or packints or bilevel) if tile: # one chunk per tile per plane if len(tile) == 2: @@ -3209,7 +3213,7 @@ class TiffWriter: if dataarray is not None: dataiter = iter_tiles(dataarray, tile, tiles) elif dataiter is None and not ( - compression or packints or bilevel + (compression != 'NONE') or packints or bilevel ): def dataiter_( @@ -3258,7 +3262,7 @@ class TiffWriter: if rowsperstrip is None: # compress ~256 KB chunks by default # TIFF-EP requires <= 64 KB - if compression: + if compression != 'NONE': rowsperstrip = 262144 // rowsize else: rowsperstrip = storedshape.length @@ -3372,7 +3376,7 @@ class TiffWriter: else: raise NotImplementedError('cannot compress bilevel image') - elif compression: + elif compression != 'NONE': compressor = TIFF.COMPRESSORS[compressiontag] if compressiontag == 32773: @@ -16772,6 +16776,14 @@ class CompressionCodec( codec = imagecodecs.jpeg_decode elif key in {8, 32946, 50013}: if ( + hasattr(imagecodecs, 'ZLIBNG') and + imagecodecs.ZLIBNG.available + ): + if self._encode: + codec = imagecodecs.zlibng_encode + else: + codec = imagecodecs.zlibng_decode + elif ( hasattr(imagecodecs, 'DEFLATE') and imagecodecs.DEFLATE.available ): @@ -17074,6 +17086,11 @@ class COMPRESSION(enum.IntEnum): JPEG = 7 """New style JPEG.""" ADOBE_DEFLATE = 8 + # Aliases for ADOBE_DEFLATE + # These aliases allow the user to specify the details + # of which compressor implementation to use + ZLIB = 8 + ZLIB_NG = 8 """Deflate, aka ZLIB.""" JBIG_BW = 9 # VC5 JBIG_COLOR = 10 ```
hmaarrfk commented 1 month ago

cgholke's other response

There are already three "zlib" codecs in imagecodecs. I think libdeflate is a good default for TIFF. It is also used by libtiff if enabled at build time. I rather add an option to pass an encoder callback function with tifffile's compressionargs argument such that any compatible codec (for example, zlib-ng or isa-l) can be used for experimentation.

For encoding this seems fine.

What would you suggest for decoding? The user passing in a custom dictionary for their preferred decoders based on the compressiontag?