RGB to YUV420 to RGB conversion does not preserve sRGB average color

AOMediaCodec / libavif

libavif - Library for encoding and decoding .avif files

Other

1.51k stars 191 forks source link

RGB to YUV420 to RGB conversion does not preserve sRGB average color #1249

Open mstoeckl opened 1 year ago

mstoeckl commented 1 year ago

To understand how much error libraries using libavif introduce by performing RGB to YUV conversion, I wrote a C program that opens an sRGB PNG image, runs avifImageRGBToYUV, and then runs avifImageYUVToRGB, and saves the image. (See avif_rgbyuvrgb.c; it requires Cairo and libavif to build.)

I should note that avif_rgbyuvrgb, like the libraries and programs I've seen using libavif, is putting 8-bit sRGB-encoded data into an avifRGBImage ; as far as I can tell, avif.h does not state what the transfer function for the pixel data in avifRGBImage should be, or otherwise warn against doing this.

When you run avif_rgbyuvrgb on image with pixel-scale colored features, with 444 chroma subsampling, the image is mostly unchanged. With 420 (or 422 or 400) subsampling, the average color of the image in different regions can shift. A fully correct RGB->YUV->RGB trip converter should not do this; while pixels may blur together locally, the global structure of the image should be preserved; and looking from a distance, the image should appear the same.

For example, below I have converted the main test image from "Gamma error in picture scaling" (http://www.ericbrasseur.org/gamma.html) to PNG. If your web browser does not already do so, view the images here at 100% scale, so that one pixel of the image is 1 pixel of the display. Since the image, like most images on the web, uses the nonlinear sRGB transfer function, the average light from of each region of pixels is not gray, but has a faint color.

gamma_dalai_lama_gray

Running avif_rgbyuvrgb 444 input.png test444.png to convert to YUV444 and back produces almost the same image: test444

Running avif_rgbyuvrgb 420 input.png test420.png to convert to YUV420 and back produces a fully gray image. (Although, Y not being subsampled, brightness is roughly preserved by the shades of gray.) test420

Running avifenc -y 444 input.png test444.avif and avifenc -y 420 input.png test420.avif produces the following .avif images, whose file extension I've changed to 'png' so that they can be linked as images:

test444 avif

test420 avif

This problem does not just affect color, but also brightness; red on black text may appear darker, not just blurrier, with libavif's current 420 chroma subsampling than with 444 subsampling. For example, given an image where 1/4th of all pixels are red, and the rest are black.

red_on_black

Running avif_rgbyuvrgb 444 red_on_black.png red_on_black_444.png to convert to YUV444 and back recovers the image.

Running avif_rgbyuvrgb 420 red_on_black.png red_on_black_420.png to convert to YUV420 and back produces a darker image:

y-guyon commented 1 year ago

Thank you for the report and detailed investigation.

To understand how much error libraries using libavif introduce by performing RGB to YUV conversion, I wrote a C program that opens an sRGB PNG image, runs avifImageRGBToYUV, and then runs avifImageYUVToRGB, and saves the image. (See avif_rgbyuvrgb.c; it requires Cairo and libavif to build.)

It reminds me of two pieces of code in libavif:

avifyuv.c is a binary, partially used as unit tests.
- mode 1 "calculates maximum codepoint drift" when converting RGB>YUV>RGB. I believe it is similar to the "average color shift" you mentioned.
avifrgbtoyuvtest.cc contains unit tests but also prints the average_diff for noisy RGB>YUV>RGB. I believe it is also similar to the "average color shift" you mentioned.

In your opinion, are the tests avifyuv and avifrgbtoyuvtest:

Not covering the right use cases?
Not having strict enough thresholds?

I should note that avif_rgbyuvrgb, like the libraries and programs I've seen using libavif, is putting 8-bit sRGB-encoded data into an avifRGBImage ; as far as I can tell, avif.h does not state what the transfer function for the pixel data in avifRGBImage should be, or otherwise warn against doing this.

I confirm the transferCharacteristics from avifImage are not used by avifImageRGBToYUV() and avifImageYUVToRGB(). Are you saying that the transfer function should be taken into account for the conversion+subsampling computation, or that the transfer function should be modified to take into account the chroma subsampling "darkening" side-effect?

avifRGBImage is not meant to be used as libavif input/output. It just a helper data structure that is used both internally and externally to get converted pixel values, but is not meaningful enough on its own; it should be accompanied by its corresponding avifImage most of the time. I believe this is why transferCharacteristics is not part of avifRGBImage: it does not impact the conversion formula (currently), and it is redundant if already available in avifImage.

Running avif_rgbyuvrgb 420 red_on_black.png red_on_black_420.png to convert to YUV420 and back produces a darker image:

I thought that yuvChromaSamplePosition may have something to do with this but unfortunately it is not used by avifImageRGBToYUV() and avifImageYUVToRGB() either.

Taking your red/black example, considering the following 4 RGB pixels:

(  0,  0,  0) (  0,  0,  0)
(  0,  0,  0) (255,  0,  0)

whose rounded average is (64, 0, 0).

I see calling avifImageRGBToYUV() as 4:2:0 then avifImageYUVToRGB() produces the following according to your image:

( 57,  0,  0) ( 57,  0,  0)
( 57,  0,  0) ( 86, 22, 22)

whose rounded average is (64, 6, 6).

So actually it is even a little bit brighter as per raw pixel values, but I understand that the sRGB transfer function (or gamma) makes it look darker. I guess the question would be: do you have a real life example of a natural photo where this issue is visible? 4:2:0 is designed towards camera pictures. There are indeed handcrafted inputs that can "trick" the conversion algorithm into incorrect results, although these examples are so rare as final user needs that it may not be necessary to fix them.

Do you know libraries or implementations that handle this topic in a way that is satisfying to you? If so, could you provide them?

mstoeckl commented 1 year ago

Do you know libraries or implementations that handle this topic in a way that is satisfying to you? If so, could you provide them?

No, although I haven't seen too many implementations. The following all accept general sRGB data, whether explicitly or implicitly, and save 420-subsampled AVIF images by setting up an avifRGBImage, calling avifImageRGBToYUV, and then avifEncoderWrite.

GdkPixbuf plugin (see contrib/gdk-pixbuf/loader.c, uses YUV420 for general images)
KImageFormats plugin (kimageformats/. . ./avif.cpp, uses YUV420 if maxQuantizer >= 20 on general images)
avifenc (calls avifImageRGBToYUV at apps/shared/avifpng.c with chromaDownsampling = 420 only if specified on the command line; it does warn about subsampling in apps/avifenc.c if writing lossless images)

I'm not sure exactly what libheif does, but there is code that would distort sRGB-encoded data. GIMP uses it and saving 420-subsampled AVIF images behaves as shown in my first post, shifting "average" colors.

To confirm that it is possible, I did write an RGB->YUV420 conversion routine that works on sRGB input, which does appear to preserve average sRGB colors. (See avif_from_png.c .)

Original input.png (view at 1 pixel to 1 pixel scale): input

Running avifenc -y 420 input.png avifenc-out420.avif produces (Note: AVIF image has .png suffix to bypass filter) avifenc-out420 avif

While the sRGB-aware subsampling in ./avif_from_png 420 input.png out420.avif produces: out420 avif

So actually it is even a little bit brighter as per raw pixel values, but I understand that the sRGB transfer function (or gamma) makes it look darker. I guess the question would be: do you have a real life example of a natural photo where this issue is visible? 4:2:0 is designed towards camera pictures. There are indeed handcrafted inputs that can "trick" the conversion algorithm into incorrect results, although these examples are so rare as final user needs that it may not be necessary to fix them.

Most natural photos have optical resolution less than digital resolution, or don't have rapid chroma/luma shifts, so this issue is hard to detect. The only photo out of the few I tried where I could obtain a visible difference between 4:4:4 and 4:2:0 subsampling was an astronomy image, where some stars were red and only one or two pixels wide :-)

I encountered this sRGB-color-shift problem because I was investigating libavif (and the applications using it) for saving general images; specifically, desktop screenshots that may contain both text and photographic parts. As noted above, despite potentially processing nonphotographic images, many applications using the plugins in the list above will save as 4:2:0 subsampled. The average-color shift is most visible with red on black text but some other color combinations are subtly off.

(Original screenshot) redblack_text

(Round trip through 4:2:0 subsampling, using avif_rgbyuvrgb 420) redblack_text_420

I confirm the transferCharacteristics from avifImage are not used by avifImageRGBToYUV() and avifImageYUVToRGB(). Are you saying that the transfer function should be taken into account for the conversion+subsampling computation, or that the transfer function should be modified to take into account the chroma subsampling "darkening" side-effect?

That the transfer function should be taken into account for the conversion+subsampling computation

This would ensure that the code I've seen producing 4:2:0 subsampled AVIF images from sRGB data would not make images with a significant "average color" shift. Whether this should be done in libavif, or if the plugins I've linked above should just be changed to always use 4:4:4 subsampling, I don't know enough to say.

joedrago commented 1 year ago

Disclaimer: If there are roundtrip drift bugs in libavif's YUV code, we should fix them, and if the drift is due to something libyuv is doing which sacrifices a tiny bit of drift for significant speed gains, that should be taken up with the libyuv folks.

Disclaimers aside though, there are a handful of unfortunate-but-common practices with regards to typical image manipulation that have a long-standing habit of ignoring the transfer function, and YUV conversion is one of them. Another is Porter/Duff blending without linearizing first, but as odd as it sounds, the world is so "used to" it, that blending in linear space would actually confuse people's expectations on (say) what an opacity of 0.6 actually looks like, and most composition/blending paths must perform those blends in an SRGB-ish curve otherwise creators will be super confused.

As for converting RGB to YUV, yes, if you don't linearize first, the values are nonsense. It is clear from the derivation of the YUV coefficients and the choice of "Y" as the channel name that it is supposed to represent the luminance of the color, but without linearizing first, it will not be.

Here's where it gets weird... despite knowing that the math breaks down entirely if not in a linear space, and the fact that H.273's equations make it clear that the MC formulas only make sense in a linear space, the common practice is to not linearize first anyway. I remember first learning about color conversion and realizing just silly this was, but at some point you have to work with what everyone else is doing, for interop purposes.

To be clear, I'm not trying to discourage you. I'm sure you're correct that the values look weird, and that subsampling in nonlinear space exacerbates things. However, whatever libavif emits as a specific variant of YUV (some MC value in CICP), this must be roundtripped "the same way" (avoiding the word "correctly" here), and this means we need to match the implementations in libyuv, ffmpeg, libvlc, etc. If I'm mistaken, and our YUV code is the only implementation to disregard the transfer function during these routines, I'll happily accept a PR that fixes it, and root for anyone that gets those changes into something as ubiquitous as libyuv. If I'm right though, and the industry all agrees on doing the same wrong thing, it'd be a mistake to jump first on such an oddly foundational/industry-wide agreement on somewhat-incorrect math.

We're probably doomed to continue to composite images in an srgb-ish/g2.2-ish working buffer, and we're similarly doomed to ignore the TC when processing MC. I'm sure we libavif folks would happily sway with the breeze of a changing industry on MC interpretation, but I'd advocate against us championing this losing cause.

tongyuantongyu commented 1 year ago

Generally, using YUV420 for images with high frequency color data is a bad idea. You should try to avoid that if possible.

With that being said, if YUV420 is the only choice, you may consider the --sharpyuv option. Here is the produced AVIF using command avifenc -s 3 -y 420 --sharpyuv -q 100 input.png output.avif:

1 avif

y-guyon commented 1 year ago

@mstoeckl

To confirm that it is possible, I did write an RGB->YUV420 conversion routine that works on sRGB input, which does appear to preserve average sRGB colors. (See avif_from_png.c .)

I built your binary with (libavif 154da769638a4536fa2c30cd115c2aca8a19faec):

sudo apt install libcairo2-dev && \
git clone https://github.com/AOMediaCodec/libavif && \
mkdir libavif/build && cd libavif/build && cmake .. -DAVIF_CODEC_AOM=ON && make -j7 && cd ../.. && \
gcc \
  avif_from_png.c \
  -lm -lcairo \
  -Llibavif/build -Ilibavif/include -lavif \
  -o avif_from_png \
  -Wl,-rpath=libavif/build

Original input.png (view at 1 pixel to 1 pixel scale):

Using the input above, I do not get exactly the same result as you when running ./avif_from_png 420 mona.png mona.png.420.avif (mine is on the right):

out420 avif mona png 420 avif

Did you modify avif_from_png.c to change the quality, for example?

Also when doing the same on your first example:

I get the following:

dalai png 420 avif

Do you know if avif_from_png.c can be fixed to work well with any reasonable input?

I agree with @joedrago as long as changing the decoder implementation is necessary. If there is a way to change only the encoder side, like --sharpyuv (thanks @tongyuantongyu for the suggestion), it may be a useful addition.

mstoeckl commented 1 year ago

Did you modify avif_from_png.c to change the quality, for example?

No, the images attached were created using the code that I attached. However, I was using libavif 0.11.1, instead of a build from git, which may have different quality defaults..

Do you know if avif_from_png.c can be fixed to work well with any reasonable input?

I get the essentially the same image with avif_from_png on my first example as you. Running avif_from_png against the 1/4 red, 3/4 black test image produces an image that gets closer to the correct average color, but not entirely there. The way avif_from_png did downsampling was based on a heuristic assuming nearest-neighbor upsampling, but a) I think my math was a bit off, and b) libavif uses bilinear filtering for its "best quality" upsampling, not nearest-neighbor. (Relevant code: src/reformat.c, src/reformat_libyuv.c, and libyuv/source/convert_argb.cc .)

As long as implementatons use the same "best quality" YUV->RGB conversion function, it should in theory be possible to design a transfer-function aware RGB->YUV conversion algorithm that preserves the apparent average colors of the image. (If implementations are using different YUV->RGB conversion functions, then the output they produce runs the risk of looking different anyway.)

With that being said, if YUV420 is the only choice, you may consider the --sharpyuv option. Here is the produced AVIF using command avifenc -s 3 -y 420 --sharpyuv -q 100 input.png output.avif

Looking at libwebp/sharpyuv/sharpyuv.c, it seems they are doing something that takes into account a transfer function. The image you show is close, but a bit more faded than the original. Unfortunately, using the --sharpyuv option currently gives me an error, so I probably need to build libavif myself before doing any experimentation.

I'm sure we libavif folks would happily sway with the breeze of a changing industry on MC interpretation, but I'd advocate against us championing this losing cause.

Agreed, changing any part of the way AVIF images are decoded is not worth pursuing. However, just as cwebp has a --sharpyuv option, it may still be worth it to figure out how to encode subsampled images (specifically, do the RGB->YUV conversion step) in a way that uses the transfer functions and preserves image appearance as much as possible.

tongyuantongyu commented 1 year ago

The image you show is close, but a bit more faded than the original. This is probably the best YUV420 can do.

Click to show the enlarged picture. CAUTION you may feel dizzy

![enlarge](https://user-images.githubusercontent.com/13075180/207774329-f11fae4d-5e91-414d-9061-e9c43da64044.png)

Looking at the enlarged picture. YUV420 shares one pair of UV values among four pixels, but there are no UV values that can get both green and magenta when paired with different Y. Gray them out is the best possible choice.

preserves image appearance as much as possible

Halving the data will definitely brings some loss, and sharpyuv is probably the best you can do without changing the decoder side.

mstoeckl commented 1 year ago

This problem has fallen down my priority list, and probably won't get back up in the next few months. Before I forget, a few notes from two weeks ago:

I don't fully understand how the iterative parts of sharpyuv work; however, I can say that it is not properly transfer function aware, since it hardcodes the Rec. 709 OETF (and not, as one might expect, the sRGB transfer function). See libwebp/sharpyuv/sharpyuv_gamma.c. I think the code will still produce OK looking output if you make the transfer function for sharpyuv an adjustable parameter. However, as far as I can tell, even when modified to use the sRGB transfer function, sharpyuv does not preserve average color on the 1/4 red, 3/4 black test image above, so while good it is not in my opinion optimal.
Consider the case of 2x2 images (equivalently, the way image decoding is done with nearest-neighbor UV upsampling). Let R'[1-4], G'[1-4] B'[1-4] be the original pixel values in sRGB; let f be the sRGB to linear transfer function composed with a clip that maps negative values to 0 and values > 1 to 1. Let R_i = f(R'_i), G_i = f(G'_i), G_i = f(G'_i) for i from 1 to 4. When doing RGB->YUV 4:2:0 conversion, the output contains values Y'[1-4], U, V . When performing YUV->RGB conversion, the values R'[1-4],G'[1-4],B'[1-4] are linear combinations of Y'[1-4], U, V . Let Y[1-4] be the relative luminances of the pixels; these are a linear function of R[1-4], G[1-4], B[1-4]. Let Y_avg, R_avg, G_avg, B_avg be the averages of Y[1-4], etc.; so Y_avg = (Y_1 + Y_2 + Y_3 + Y_4) / 4.

Unavoidably in the RGB->YUV420 conversion, some information in the 2x2 image will be lost. To make the image look as close as possible to the original, there are 7 quantities that are most worth preserving: R_avg, G_avg, B_avg, and Y[1-4]. (These have only 6 degrees of freedom, since Y_avg is a linear combination of R_avg,G_avg,B_avg.) If R_avg,G_avg,B_avg stay the same, then the "average" color of the image -- which is what is seen when someone is looking at a distance or is not directly focusing on the image -- stays the same. Preserving Y[1-4] ensures that the "texture" of the image stays the same -- i.e, the fine details of the image when it is converted to grayscale.

Because f is nonlinear, the ranges of U,V are restricted, and Y'[1-4] is not permitted to be negative, it is not always possible to preserve all 7 quantities. It is easy to preserve only a few of the quantities; for example, to preserve (R_avg, G_avg, B_avg), compute Y',U,V from f^-1(R_avg),f^-1(G_avg),f^-1(B_avg), and set Y'_1 = Y'_2 = Y'_3 = Y'_4 = Y'.Also, even on inputs where all 7 quantities can be preserved, actually computing the values Y'[1-4],U,V requires solving a nonlinear system.

I think that the best way to handle RGB->YUV420 conversion is to compute values Y'[1-4],U,V for which (R_avg, G_avg, B_avg) are preserved exactly, while Y[1-4] are as close as possible. Preserving the average color is more important than preserving the relative luminance details of a 2x2 block, because human vision has limited resolution (if not directly looking at a part of the image from a close distance.) Any systematic changes to R_avg, G_avg, B_avg will be seen, because they affect the average color of the entire image, making it look different even when seen from a distance; however, assuming R_avg, G_avg, B_avg are preserved, systematic changes to Y[1-4] will only be noticeable if you look closely.

I am not sure what the 'ideal' RGB->YUV420 conversion looks like when the inverse YUV->RGB conversion uses bilinear UV upsampling. Maybe the same objective as for nearest neighbor UV upsampling could work -- to preserve R_avg,G_avg,B_avg on every 2x2 block, (and then match Y[1-4] as well as possible) -- but perhaps the local average color should be computed differently, using something like the [0.125 0.375 0.375 0.125]^2 kernel from bilinear interpolation, or the gaussian kernel.
My current monitor is not fully calibrated, so checking "average color" by eyeballing it is not entirely reliable when the differences are subtle . Another way to check is to downscale the source image and final AVIF image by a factor of 2, using a gamma-correct converter like gegl input.png -o output.png -s 0.5. If the results look the same (or are indistinguishable to dssim), then they should look the same from a distance on a properly calibrated monitor.
A script to implement RGB->YUV conversion. Some modes are very slow, and don't seem to work correctly on all images -- I think the flaw is that the optimization routines don't always convert to the true minimum. ideal_fit.py.txt

Another script which creates an image that looks gray when you do gamma-incorrect factor 2 downscaling with a 2x2 average kernel, or do naive Y:U:V 4:2:0 subsampling. hide_image.py.txt