Closed SimonSegerblomRex closed 2 months ago
This is WIP and will stay as a draft pull request until there's an official libjpeg-turbo release that includes the changes necessary.
Thanks. I am aware of the ongoing work in libjpeg-turbo. Note that the JPEG codec in imagecodecs switches to the LJPEG codec for bit-depths not supported by libjpeg-turbo.
Note that the JPEG codec in imagecodecs switches to the LJPEG codec for bit-depths not supported by libjpeg-turbo.
Yes, ljpeg_decode
seems to work fine and will still be needed as backup in jpeg_decode
for images that libjpeg-turbo refuses to decode due to the issues discussed in https://github.com/libjpeg-turbo/libjpeg-turbo/issues/586 and https://github.com/libjpeg-turbo/libjpeg-turbo/issues/765. ljpeg_encode
shouldn't be needed any longer though.
I tested this with a 16bit Lossless JPEG file as input:
import sys
from imagecodecs import imread, jpeg8_decode, jpeg8_encode
from numpy.testing import assert_array_equal
filename = sys.argv[1]
image = imread(filename)
if image.ndim > 2:
image = image[..., 0].copy() # copy to fix strides
for bit_depth in range(16, 1, -1):
print(bit_depth)
if bit_depth <= 8 and image.itemsize > 1:
# FIXME: Should this really be necessary?
image = image.astype("u1")
enc = jpeg8_encode(
image,
lossless=True,
predictor=1,
bitspersample=bit_depth,
)
dec = jpeg8_decode(enc)
assert_array_equal(image, dec)
image <<= 1
It works, but the case with bit-depth <= 8 in a uint16 array should be handled in a better way.
EDIT: Fixed this with the check here.
(I replaced the broken dng*.ljp files that were created using my broken Lossless JPEG encoder.)
I did a quick benchmark comparing jpeg8_decode
and ljpeg_decode
. jpeg8_decode
is about ~40 % faster using this input: Pentax-K-1-DNG-extracted.jpg ( 3696x4950, 2 components) (Note: Pentax DNG files are the only images I've found in the wild hit by this problem, so you need that patch to get past the "Bogus Huffman table definition" error.)
Everything seems to work as expected now, but I guess we should wait for an official libjpeg-turbo tag.
I found this source containing a lot of Lossless JPEG files (embedded in DICOM files). A quick test shows that libjpeg-turbo and lj92 produce slightly different results for some of them, e.g., gdcm-JPEG-LossLessThoravision.dcm. BitsPerSample is 15 and in the decoded arrays there are values as high as 65520 for lj92 and 65535 for libjpeg-turbo... something weird is going on here (even considering that the decoded values are probably supposed to be reinterpreted as signed values or something). Do you have any input regarding this file @malaterre? EDIT: Solved by using gdcmraw
to extract the JPEG file. Now this files behaves as expected both with lj92 and libjpeg-turbo.
I found this source containing a lot of Lossless JPEG files (embedded in DICOM files). A quick test shows that libjpeg-turbo and lj92 produce slightly different results for some of them, e.g., gdcm-JPEG-LossLessThoravision.dcm. BitsPerSample is 15 and in the decoded arrays there are values as high as 65520 for lj92 and 65535 for libjpeg-turbo... something weird is going on here (even considering that the decoded values are probably supposed to be reinterpreted as signed values or something). Do you have any input regarding this file @malaterre?
@SimonSegerblomRex What do you get if you use thorfdbg/libjpeg ?
@SimonSegerblomRex What do you get if you use thorfdbg/libjpeg ?
With thorfdbg/libjpeg I get:
reading a JPEG file failed - error -1038 - invalid stream, found invalid huffman code in entropy coded segment
and that's probably the right thing. The images decoded by lj92 and libjpeg-turbo are completely broken, so they would have been better off failing as well than trying to decode garbage.
I found that lj92 fails to decode MARCONI_MxTWin-12-MONO2-JpegLossless-ZeroLengthSQ.dcm (just 0s out) while libjpeg-turbo decodes it without issues :+1: EDIT: Extracting and repairing the JPEG file using gdcmraw
it decodes as expected also with lj92.
@SimonSegerblomRex What do you get if you use thorfdbg/libjpeg ?
With thorfdbg/libjpeg I get:
reading a JPEG file failed - error -1038 - invalid stream, found invalid huffman code in entropy coded segment
and that's probably the right thing. The images decoded by lj92 and libjpeg-turbo are completely broken, so they would have been better off failing as well than trying to decode garbage.
What kind of command did you use ?
% gdcmraw gdcm-JPEG-LossLessThoravision.dcm /tmp/bla.jpg
% jpeg /tmp/bla.jpg /tmp/bla.pgm
jpeg Copyright (C) 2012-2018 Thomas Richter, University of Stuttgart
and Accusoft
For license conditions, see README.license for details.
0 bytes memory not yet released.
15905134 bytes maximal required.
4197 allocations performed.
EDIT: Using the output from gdcmraw
(that's actually not part of the DICOM file) I get the same output using all three decoders :+1:
First I just used this script to extract the JPEG file:
import re
import struct
import sys
SOI = struct.pack(">H", 0xFFD8)
SOF3 = struct.pack(">H", 0xFFC3)
EOI = struct.pack(">H", 0xFFD9)
with open(sys.argv[1], "rb") as f:
data = f.read()
matches = re.finditer(b"(?=(" + SOI + b".*?" + SOF3 + b".+?" + EOI + b"))", data, re.S)
for i, match in enumerate(matches):
with open(f"{i}.jpg", "wb") as f:
print(i)
f.write(match.group(1))
It seems like gdcmraw
does some magic to repair the broken file.
This is ready for code review (but there's still no new libjpeg-turbo release or tag).
Thank you. I will review this when libjpeg-turbo 3.1 is released.
I have tested this with libjpeg-turbo 3.1 beta and it works as expected. The changes will be in the next release of imagecodecs along with some tweaks to make tests pass with libjpeg-turbo 3.0. Thank you.
Thank you @cgohlke! Do you think you'll have time to publish the new release soon? If not it would be great if you could push your changes to a dev branch, I would like to verify that my main use-case (with two components) still works.
The plan is to do a release before Python 3.13, this or next weekend. Hopefully there are no major issues on macOS.
I am attaching the current _jpeg8.pyx, which should be enough for you to test, no?
The plan is to do a release before Python 3.13, this or next weekend. Hopefully there are no major issues on macOS.
Thank you, sounds good!
I am attaching the current _jpeg8.pyx, which should be enough for you to test, no?
Yes, I confirmed that it works 👍
Fixed in imagecodecs 2024.9.22.
Fixed in imagecodecs 2024.9.22.
Thank you for the new tag @cgohlke! It works well when I build imagecodecs from source against libjpeg-turbo 3.0.90beta. The wheel release was built against libjpeg-turbo 3.0.4, right? Things like encoding images with 14 bits bitspersample as 2 components still doesn't work when installing the wheel. Will you consider making a release built against libjpeg-turbo 3.0.90beta, or will you wait for the 3.1 release? EDIT: ...or is the system library dynamically linked? Then I need to check my environment.
The released wheels are built against libjpeg-turbo 3.0.4. I'll wait for version 3.1. No, the system JPEG library is not used. The dynamic JPEG library used is in the imagecodecs/libs
directory, with a name like libjpeg-3a8ca8f3.so.8
.
Enabled by the solution to https://github.com/libjpeg-turbo/libjpeg-turbo/issues/768 (Planned to be included in libjpeg-turbo release 3.1.0.)
There are still some Lossless JPEG encoded images that libjpeg-turbo refuses to decode, see the discussions in:
Note to self while testing with local copy of libjpeg-turbo: Put this in the
customize_build
function used bysetup.py
:and make sure to set
before running any python script importing imagecodecs.