jpeg decoder doesn't decode colors quite right

nico commented 10 months ago

I think the jpeg decoder might have channels off by a small amount, due to various things suggesting it:

If I make green.png filled with (0, 255, 0) and then repeatedly decode and re-encode (for i in {1..100}; do Build/lagom/bin/image -o green.png green.jpg; Build/lagom/bin/image -o green.jpg green.png; done), the image's color is eventually (1, 239, 6). After that it seems to no longer change. (Edit: See comment below!) If I do the same with macOS's built-in image processor (for i in {1..200}; do sips -s format png -o green.png green.jpg; sips -s format jpeg -o green.jpg green.png; done), it only goes to (0, 255, 1). Now, jpeg compression is lossy, and maybe sips has a higher default quality and this is expected – but it seems like a pretty big error?
The zip file available for download at https://displaycal.net/icc-color-management-test/ has a png, a jpg, and a tiff all containing exactly the same color profile. If I convert the PNG or the TIFF file to sRGB then the output looks like expected, but the jpeg has its colors off. (I imagine the color profile maps small differences in pixel values in the input space to big differences in the output.) And converting the jpg to a png with sips and then doing the color space conversion from that png file using image also produces good output.

Build/lagom/bin/icc -n sRGB --reencode-to serenity-sRGB.icc
Build/lagom/bin/image -o out-jpg.png ~/Downloads/ICC-CM-Test/ICC\ Rendering\ Intent\ Test.jpg --convert-to-color-profile serenity-sRGB.icc Build/lagom/bin/image -o out-png.png ~/Downloads/ICC-CM-Test/ICC\ Rendering\ Intent\ Test.png --convert-to-color-profile serenity-sRGB.icc
Similarly, if I convert https://littlecms.com/img/blog/check_full.jpg to sRGB, the background is too light, but if I convert it to png with a non-serenity tool first and then do the color space conversion on the PNG, it's better (not perfect: The background is now no longer too bright, but the foreground is too bright).

It's possible to strip the color profile and convert to png to see the raw pixel data before the synthetic profiles (using either sips -d profile -s format png -o out.png in.jpg, or image -o out.png --strip-color-profile in.jpg, and then compare the raw rgb output decoder of serenity's jpeg loader to other programs. The color values here are a bit off as well. (This could be due to us doing nearest-neighbor chroma upsampling, but all these files don't use chroma downsampling, so this doesn't apply.)

nico commented 10 months ago

Oh wait, I had a local diff applied that makes this a bit better. On trunk, the "reencode 100 times" experiment leads to a final color of (98, 150, 90), which is very off.

My local diff is not truncating to i16 in the middle of the inverse DCT, and rounding in the YCbCr->RGB conversion:

local diff

```diff diff --git a/Userland/Libraries/LibGfx/ImageFormats/JPEGLoader.cpp b/Userland/Libraries/LibGfx/ImageFormats/JPEGLoader.cpp index 32d19ae49d..144c42c0a9 100644 --- a/Userland/Libraries/LibGfx/ImageFormats/JPEGLoader.cpp +++ b/Userland/Libraries/LibGfx/ImageFormats/JPEGLoader.cpp @@ -1406,6 +1406,8 @@ static void inverse_dct(JPEGLoadingContext const& context, Vector& m static float const s6 = AK::cos(6.0f / 16.0f * AK::Pi) / 2.0f; static float const s7 = AK::cos(7.0f / 16.0f * AK::Pi) / 2.0f; + float workspace[8 * 8]; + for (u32 vcursor = 0; vcursor < context.mblock_meta.vcount; vcursor += context.sampling_factors.vertical) { for (u32 hcursor = 0; hcursor < context.mblock_meta.hcount; hcursor += context.sampling_factors.horizontal) { for (u32 component_i = 0; component_i < context.components.size(); component_i++) { @@ -1473,24 +1475,24 @@ static void inverse_dct(JPEGLoadingContext const& context, Vector& m float const b6 = c6 - c7; float const b7 = c7; - block_component[0 * 8 + k] = b0 + b7; - block_component[1 * 8 + k] = b1 + b6; - block_component[2 * 8 + k] = b2 + b5; - block_component[3 * 8 + k] = b3 + b4; - block_component[4 * 8 + k] = b3 - b4; - block_component[5 * 8 + k] = b2 - b5; - block_component[6 * 8 + k] = b1 - b6; - block_component[7 * 8 + k] = b0 - b7; + workspace[0 * 8 + k] = b0 + b7; + workspace[1 * 8 + k] = b1 + b6; + workspace[2 * 8 + k] = b2 + b5; + workspace[3 * 8 + k] = b3 + b4; + workspace[4 * 8 + k] = b3 - b4; + workspace[5 * 8 + k] = b2 - b5; + workspace[6 * 8 + k] = b1 - b6; + workspace[7 * 8 + k] = b0 - b7; } for (u32 l = 0; l < 8; ++l) { - float const g0 = block_component[l * 8 + 0] * s0; - float const g1 = block_component[l * 8 + 4] * s4; - float const g2 = block_component[l * 8 + 2] * s2; - float const g3 = block_component[l * 8 + 6] * s6; - float const g4 = block_component[l * 8 + 5] * s5; - float const g5 = block_component[l * 8 + 1] * s1; - float const g6 = block_component[l * 8 + 7] * s7; - float const g7 = block_component[l * 8 + 3] * s3; + float const g0 = workspace[l * 8 + 0] * s0; + float const g1 = workspace[l * 8 + 4] * s4; + float const g2 = workspace[l * 8 + 2] * s2; + float const g3 = workspace[l * 8 + 6] * s6; + float const g4 = workspace[l * 8 + 5] * s5; + float const g5 = workspace[l * 8 + 1] * s1; + float const g6 = workspace[l * 8 + 7] * s7; + float const g7 = workspace[l * 8 + 3] * s3; float const f0 = g0; float const f1 = g1; @@ -1540,14 +1542,14 @@ static void inverse_dct(JPEGLoadingContext const& context, Vector& m float const b6 = c6 - c7; float const b7 = c7; - block_component[l * 8 + 0] = b0 + b7; - block_component[l * 8 + 1] = b1 + b6; - block_component[l * 8 + 2] = b2 + b5; - block_component[l * 8 + 3] = b3 + b4; - block_component[l * 8 + 4] = b3 - b4; - block_component[l * 8 + 5] = b2 - b5; - block_component[l * 8 + 6] = b1 - b6; - block_component[l * 8 + 7] = b0 - b7; + block_component[l * 8 + 0] = round_to(b0 + b7); + block_component[l * 8 + 1] = round_to(b1 + b6); + block_component[l * 8 + 2] = round_to(b2 + b5); + block_component[l * 8 + 3] = round_to(b3 + b4); + block_component[l * 8 + 4] = round_to(b3 - b4); + block_component[l * 8 + 5] = round_to(b2 - b5); + block_component[l * 8 + 6] = round_to(b1 - b6); + block_component[l * 8 + 7] = round_to(b0 - b7); } } } @@ -1607,9 +1609,9 @@ static void ycbcr_to_rgb(JPEGLoadingContext const& context, Vector& u32 const chroma_pxrow = (i / context.sampling_factors.vertical) + 4 * vfactor_i; u32 const chroma_pxcol = (j / context.sampling_factors.horizontal) + 4 * hfactor_i; u32 const chroma_pixel = chroma_pxrow * 8 + chroma_pxcol; - int r = y[pixel] + 1.402f * (chroma.cr[chroma_pixel] - 128); - int g = y[pixel] - 0.3441f * (chroma.cb[chroma_pixel] - 128) - 0.7141f * (chroma.cr[chroma_pixel] - 128); - int b = y[pixel] + 1.772f * (chroma.cb[chroma_pixel] - 128); + int r = round_to(y[pixel] + 1.402f * (chroma.cr[chroma_pixel] - 128)); + int g = round_to(y[pixel] - 0.3441f * (chroma.cb[chroma_pixel] - 128) - 0.7141f * (chroma.cr[chroma_pixel] - 128)); + int b = round_to(y[pixel] + 1.772f * (chroma.cb[chroma_pixel] - 128)); y[pixel] = clamp(r, 0, 255); cb[pixel] = clamp(g, 0, 255); cr[pixel] = clamp(b, 0, 255); ```

But that doesn't fix all the things (it's a bit better though), and it's kinda ad-hoc.

It'd be good to know what the IDCT code we have was based on (@devsh0, do you remember?), etc.

Apparently ITU T83 is about jpeg conformance testing, but it's not freely available – more ISO nonsense :/

LucasChollet commented 10 months ago

Off the top of my head, there is no exact requirement on the output of a JPEG decoder. In other words, rounding is not specified, so two conforming decoders might return results that differ by one. Far be it from me to suggest closing this issue, with our decoder I remember seeing differences of ~3 on the u8 scale.

And while I'm sure the decoder is faulty, the issue you're seeing in your first point could be caused by the encoder exclusively. It would be worth to test them separately.

devsh0 commented 10 months ago

Hi Nico!

I went through a lot of stuff online looking for an optimal implementation of IDCT, I don't have all of them handy right now, but I think the main idea was from this one here. The article links to a C++ implementation of 1D forward DCT (see aan.cc). I made some (many?) tweaks to the algorithm to have it do 2D inverse DCT instead.

Could this be that we're using floats all over the IDCT impl right now and it's causing drastic precision loss? Does changing floats to doubles help?

nico commented 10 months ago

https://github.com/SerenityOS/serenity/compare/master...nico:serenity:jpeg-colors-reloaded?expand=1 gets it from

% Build/lagom/bin/test-jpeg-roundtrip                                          
color #ff0000ff converges to #b86a5aff after saving 52 times, delta 16.20746
color #00ff00ff converges to #62965aff after saving 51 times, delta 28.695068
color #0000ffff converges to #8072c5ff after saving 99 times, delta 25.383066
color #add8e6ff converges to #628a97ff after saving 40 times, delta 22.427776
color #c00000ff converges to #b86768ff after saving 71 times, delta 20.48682
color #00c000ff converges to #62965aff after saving 37 times, delta 16.49776
color #0000c0ff converges to #8072c5ff after saving 107 times, delta 28.949486
color #800000ff converges to #b86768ff after saving 90 times, delta 26.313969
color #008000ff converges to #62965aff after saving 53 times, delta 14.143662
color #000080ff converges to #8072c5ff after saving 114 times, delta 33.105675
color #00ffffff converges to #4795a5ff after saving 50 times, delta 27.554106
color #ff00ffff converges to #9d69a5ff after saving 51 times, delta 18.901424
color #ffff00ff converges to #9c7f3aff after saving 65 times, delta 35.34598
color #000000ff converges to #808080ff after saving 127 times, delta 39.9349
color #404040ff converges to #808080ff after saving 65 times, delta 23.412678
color #7f7f7fff converges to #808080ff after saving 2 times, delta 0.38056377
color #c0c0c0ff converges to #808080ff after saving 57 times, delta 19.67883
color #ffffffff converges to #808080ff after saving 104 times, delta 33.238556

max delta 39.9349, max number of iterations 127

to

color #ff0000ff converges to #fe0000ff after saving 2 times, delta 0.20785047
color #00ff00ff converges to #00ff01ff after saving 2 times, delta 0.016206387
color #0000ffff converges to #0000feff after saving 2 times, delta 0.11861683
color #add8e6ff converges to #aed8e4ff after saving 2 times, delta 0.7249369
color #c00000ff converges to #c10003ff after saving 3 times, delta 0.5185138
color #00c000ff converges to #01c000ff after saving 2 times, delta 0.01402487
color #0000c0ff converges to #0000c0ff after saving 1 times, delta 0
color #800000ff converges to #800000ff after saving 1 times, delta 0
color #008000ff converges to #008001ff after saving 2 times, delta 0.06408679
color #000080ff converges to #010080ff after saving 2 times, delta 0.054292668
color #00ffffff converges to #00ffffff after saving 1 times, delta 0
color #ff00ffff converges to #ff00feff after saving 2 times, delta 0.13591829
color #ffff00ff converges to #ffff00ff after saving 1 times, delta 0
color #000000ff converges to #000000ff after saving 1 times, delta 0
color #404040ff converges to #404040ff after saving 1 times, delta 0
color #7f7f7fff converges to #7f7f7fff after saving 1 times, delta 0
color #c0c0c0ff converges to #c0c0c0ff after saving 1 times, delta 0
color #ffffffff converges to #ffffffff after saving 1 times, delta 0

max delta 0.7249369, max number of iterations 3

This is almost as good as it gets: If a color is modified at all when saving, the best we can do is saving twice, and we manage to do that for all but one of the colors we're testing here.

The changes are:

Use a float[64] workspace for storing the intermediate results of the row 1d idct so we don't lose precision between the two passs
Round when storing the results of the column 1d idct, and when converting from YCbCr to rgb, and when converting from rgb to YCbCr in the writer (the writer already rounded after the forward dct)
Use / sqrt(8) instead of * rsqrt(8) (!)

Interestingly, all of those need to happen for things to get much better. Doing any subset doesn't yield a big improvement, as far as I can tell.

(I also have a local hacked-up version with a slow and naive idct, and that manages to need just two saves and has a smaller error:

% Build/lagom/bin/test-jpeg-roundtrip                                          
color #ff0000ff converges to #fe0000ff after saving 2 times, delta 0.20785047
color #00ff00ff converges to #00ff01ff after saving 2 times, delta 0.016206387
color #0000ffff converges to #0000feff after saving 2 times, delta 0.11861683
color #add8e6ff converges to #aed8e6ff after saving 2 times, delta 0.24746181
color #c00000ff converges to #c00000ff after saving 1 times, delta 0
color #00c000ff converges to #01c000ff after saving 2 times, delta 0.01402487
color #0000c0ff converges to #0000c0ff after saving 1 times, delta 0
color #800000ff converges to #800000ff after saving 1 times, delta 0
color #008000ff converges to #008001ff after saving 2 times, delta 0.06408679
color #000080ff converges to #010080ff after saving 2 times, delta 0.054292668
color #00ffffff converges to #00ffffff after saving 1 times, delta 0
color #ff00ffff converges to #ff00feff after saving 2 times, delta 0.13591829
color #ffff00ff converges to #ffff00ff after saving 1 times, delta 0
color #000000ff converges to #000000ff after saving 1 times, delta 0
color #404040ff converges to #404040ff after saving 1 times, delta 0
color #7f7f7fff converges to #7f7f7fff after saving 1 times, delta 0
color #c0c0c0ff converges to #c0c0c0ff after saving 1 times, delta 0
color #ffffffff converges to #ffffffff after saving 1 times, delta 0

max delta 0.24746181, max number of iterations 2

For that to work with floats, I had to do dct_t const c = u == 0 ? (v == 0 ? 0.5 : inverse_sqrt_2) : (v == 0 ? inverse_sqrt_2 : 1); – just doing dct_t const cv = v == 0 ? inverse_sqrt_2 : 1; and same for cu and then doing cu * cv like JPEGWriter does had slightly worse results. With doubles, either was fine.)

)

nico commented 10 months ago

Some notes on rsqrt(8) precision: A completely black 8x8 tile has -1024 in the Y DC component (in (0, 0), and zeros everywhere else. The fast idct code computes -1024 s0 and puts that in the first row of the 8x8 tile after the row 1d idcts, and then spreads out -1024 s0 s0 over the whole 8x8 tile in the column 1d idct pass. s0 is `cos(0) rsqrt(8), orcos(0) / sqrt(8)(andcos(0)is1` as it should be).

1024 / sqrt(8) / sqrt(8) == 1024 / 8 == 128, which after the -128 level shift is exactly 0.

If I add this to the top of inverse_dct():

dbgln("1 / sqrt(8): {}, {}", AK::rsqrt(8.0f), 1.0f / AK::sqrt(8.0f));

Then that prints

 0.35253906, 0.35355338

-1024 * 0.35355338 * 0.35355338 (the 2nd value) is -127.999, and that rounds to -128 (which is the correct result)

But -1024 * 0.35253906 * 0.35253906 is -127.26..., which is off by enough that it rounds to -127, which leads to a color #000000ff converges to #292929ff after saving 42 times, delta 10.227973 result if I do everything except the the rsqrt change.

https://github.com/SerenityOS/serenity/pull/13405/commits/7041c842ede8ca6a0599e07420f1de1c6d4f48fc changed the JPEGLoader code to call rsqrt.

https://github.com/SerenityOS/serenity/pull/15564/commits/4b4d8342f4d6d7d844068a661005401bfad77488 made rsqrt call frsqrte on arm.

I think the takeaway here is that either JPEGLoader should call rsqrt, or rsqrt shouldn't be implemented in terms of frsqrte on arm since it has too high an error. @Hendiadyoin1, preferences which of the two changes we should undo?

nico commented 10 months ago

Re "is the decoder 'correct enough'"? See commit 3 here: https://github.com/SerenityOS/serenity/compare/master...nico:serenity:jpeg-decoder-compliance?expand=1

The answer is: It currently isn't. To get it correct enough needs less than the changes in https://github.com/SerenityOS/serenity/issues/22739#issuecomment-1912899925 – see the branch linked to at the start of this comment for needed changes for that. It does require resolving the rsqrt issue though.

SerenityOS / serenity

jpeg decoder doesn't decode colors quite right #22739