cgohlke / imagecodecs

Image transformation, compression, and decompression codecs
https://pypi.org/project/imagecodecs
BSD 3-Clause "New" or "Revised" License
111 stars 21 forks source link

Compressing "technical" imagery using JPEG-XL #31

Closed JackKelly closed 2 years ago

JackKelly commented 2 years ago

Hi again!

I've been having a really interesting conversation with the JPEG-XL folks about using JPEG-XL to compress "technical" imagery (where we want to recover the original numerical values; and we don't care much about the images looking good to humans. In my case, I want to compress multi-channel satellite imagery. The images will mostly be consumed by machines, not humans).

The thread has thrown up a bunch of interesting ideas. I'm going to try modifying imagecodecs/_jpegxl.pyx to implement some of these ideas as a separate PR.

If there's a "development version" of imagecodes, please may I see the latest version of imagecodecs/_jpegxl.pyx, just to make it easy to merge my PR into master if my PR proves useful? No worries if not!

cgohlke commented 2 years ago

I'm not using GitHub for development but one way of publishing releases. These are the relevant changes:

diff --git a/imagecodecs/_jpegxl.pyx b/imagecodecs/_jpegxl.pyx
index 9595dd2..2c4567f 100644
--- a/imagecodecs/_jpegxl.pyx
+++ b/imagecodecs/_jpegxl.pyx
@@ -125,6 +123,8 @@ def jpegxl_encode(
     level=None,  # None or < 0: lossless, 0-4: tier/speed
     effort=None,
     distance=None,
+    lossless=None,
+    decodingspeed=None,
     photometric=None,
     usecontainer=None,
     numthreads=None,
@@ -157,8 +159,8 @@ def jpegxl_encode(
         JxlPixelFormat pixel_format
         JxlColorEncoding color_encoding
         JXL_BOOL use_container = bool(usecontainer)
-        JXL_BOOL option_lossless = level is None or level < 0
-        int option_tier = _default_value(level, 0, 0, 4)
+        JXL_BOOL option_lossless = lossless is None or bool(lossless)
+        int option_tier = _default_value(decodingspeed, 0, 0, 4)
         int option_effort = _default_value(effort, 3, 3, 9)  # 7 is too slow
         float option_distance = _default_value(distance, 1.0, 0.0, 15.0)
         size_t num_threads = <size_t> _default_threads(numthreads)
@@ -170,6 +172,14 @@ def jpegxl_encode(
         # input is a JPEG stream
         return jpegxl_from_jpeg(data, use_container, num_threads, out)

+    if level is not None:
+        if level < 0:
+            option_lossless = JXL_TRUE
+        elif level > 4:
+            option_tier = 4
+        else:
+            option_tier = level
+
     src = numpy.ascontiguousarray(data)
     dtype = src.dtype
     srcsize = src.nbytes

Instead of using imagecodecs setup.py it's probably easier to extract the few relevant files into a new project or use a reduced setup.py, e.g.:

# setup_codec.py
# python setup_codec.py build_ext --inplace

from setuptools import setup, Extension
import numpy

codec = 'jpegxl'
libraries = [
    'jxl-static',
    'jxl_dec-static',
    'jxl_extras-static',
    'jxl_threads-static',
    'jxl_brotlienc-static',
    'jxl_brotlidec-static',
    'jxl_brotlicommon-static',
    'jxl_hwy',
    'jxl_lodepng',
    'jxl_lskcms',
    'jxl_sjpeg',
]
define_macros = [('JXL_STATIC_DEFINE', 1), ('JXL_THREADS_STATIC_DEFINE', 1)]
include_dirs = ['imagecodecs', numpy.get_include()]
library_dirs = []
extra_compile_args = []

ext_modules = [
    Extension(
        'imagecodecs._shared',
        ['imagecodecs/_shared.pyx'],
        include_dirs=['imagecodecs'],
    ),
    Extension(
        f'imagecodecs._{codec}',
        [f'imagecodecs/_{codec}.pyx'],
        include_dirs=include_dirs,
        library_dirs=library_dirs,
        libraries=libraries,
        extra_compile_args=extra_compile_args,
        define_macros=define_macros,
    ),
]

ext_modules[0].cython_compile_time_env = {'IS_PYPY': False}  # ugly hack

setup(name=f'imagecodecs_{codec}', version='2022.1.x', ext_modules=ext_modules)
JackKelly commented 2 years ago

That's really useful, thank you!