KhronosGroup / WebGL

The Official Khronos WebGL Repository
Other
2.64k stars 669 forks source link

Specify what happens with texImage2D(..., gl.SRGB8, video) #3472

Closed kkinnunen-apple closed 2 years ago

kkinnunen-apple commented 2 years ago

Specify what happens with texImage2D(..., gl.SRGB8, video)

Consider code:

gl = canvas.getContext('webgl2');
gl2 = canvas2.getContext('webgl2');
gl.texImage2D(gl.TEXTURE_2D, 0, gl.SRGB8, gl.RGB, gl.UNSIGNED_BYTE, video);
gl.drawArrays(gl.POINTS, 0, 1);
gl2.texImage2D(gl2.TEXTURE_2D, 0, gl2.RGB8, gl2.RGB, gl.UNSIGNED_BYTE, video);
gl2.drawArrays(gl2.POINTS, 0, 1);

Which one is correct:

  1. The canvas, canvas2 and video match visually (e.g. follow the model similar to other internal formats such as float formats)
  2. The canvas is the darkest, canvas2 and video match visually (e.g. follow the model similar to texImage2D(.., data))

Similar issue: #3350

WebKit bug: https://bugs.webkit.org/show_bug.cgi?id=222822

Test case: https://bugs.webkit.org/attachment.cgi?id=461426

kkinnunen-apple commented 2 years ago

Other way of asking:

What does the internalFormat mean?

  1. Precision of the stored values (current Firefox and WebKit, buggily)
  2. Format of the video (current Chrome)
lexaknyazev commented 2 years ago

For the texImage calls, the format and type define the number of channels and data type of the incoming data (i.e. CPU format), the internalformat defines the format that GPU will use.

In the example above, SRGB8 internal format means that sampling from the texture would apply non-linear transfer function. So canvas2 and video will match, canvas will be different, assuming that fragment shader is passthrough.

kkinnunen-apple commented 2 years ago

SRGB8 internal format means that sampling from the texture would apply non-linear transfer function.

This I understand. The question is: what is the input data for the non-linear function. What are the values inside the texture that ended up there?

So canvas2 and video will match, canvas will be different, assuming that fragment shader is passthrough.

This I don't see from the specification. There's no specification I see that states how the video transformed from the current video frame to the texture storage.

Putting it the other way:

To me, none of these are clear cut "native format of the video". I could see arguments for both: either you take some sort of "truncation" viewpoint or you take the encoding viewpoint.

From my layman thought, the encoding viewpoint is more consistent: it produces the sensible texture (same texture sampling, some variation based on the precision of the storage vs the precision of the input).

kkinnunen-apple commented 2 years ago

For the texImage calls, the format and type define the number of channels and data type of the incoming data (i.e. CPU format)

Also, the video is part of the parameter set that specifies the CPU format of the incoming data.

lexaknyazev commented 2 years ago

I'd define HTMLVideoElement uploads similar to HTMLImageElement:

Since the simple texImage entrypoints are the same for Image and Video DOM uploads, I'd expect them to behave alike, thus option 2 from the original post.

djg commented 2 years ago

I agree with @lexaknyazev that gl.SRGB8 should apply the sRGB to Linear conversion on the results of conversion of the video frame to 8-bit RGB values.

kdashg commented 2 years ago

The behavior of texImage2D(SRGB, RGB, UBYTE, 0x80) is that it will decode-on-fetch as ~0.2. (because 0.5 "perceptual" srgb is just ~20% of the photons as 1.0! https://hackmd.io/0wkiLmP7RWOFjcD13M870A?both#Physical-vs-Perceptual-So-what%E2%80%99s-rgb05-05-05-mean)

It seems natural to me that texImage2D(canvas) should do the same thing as texImage2D(canvas.getImageData().data), and it does: https://jsfiddle.net/qgfLw1er/8/ Further, it seems natural to me that drawing a video to a canvas, and then uploading the canvas, would again behave the same way.

I think this gets into "do you want color management or raw pixel data/bytes". I don't think we want to tie this into the colorspace-decode-enable/disable stuff though.

An upload into SRGB8 via RGB+UBYTE operates on raw data/bytes. By passing something to upload here, you are saying "here are values, but when you fetch from them, apply the transform".

I think the most reasonable thing is, if you want to merely store e.g. a video as SRGB, but sample the same pixel values (modulo quantization errors, because this is lossy!), then you'd need to upload to RGBA8, and draw that to an SRGB8_ALPHA8 texture/framebuffer.

Which one is correct:

1. The `canvas`, `canvas2` and `video` match visually (e.g. follow the model similar to other internal formats such as float formats)

2. The `canvas` is the darkest, `canvas2` and `video` match visually (e.g. follow the model similar to `texImage2D(.., data)`)

I feel like I want #2 to be correct, but that I want to think more.

SRGB8_ALPHA8 not matching SRGB8 is definitely wrong though.

kkinnunen-apple commented 2 years ago

Maybe the discussion with contrasting "raw bytes" (texImage2D(data)) is not fruitful and could be omitted? There is no generalised "raw bytes" in the video file. There is only the video file format contents, which then arguably produces some sort of idealised bitmap values in the colourspace the video describes. In OpenGL terms, the end values are the linear float values that the shaders let the humans see?

You use an analogy example texImage2D(canvas) vs texImage2D(canvas.getImageData().data). Initially I'm of the opinion it seems an invalid analogy, as the corresponding analogy would be texImage2D(video) vs texImage2D(video.getData()), which is invalid. This is on the grounds that there is no meaningful video.getData() if you are not willing to augment with video.getColorSpace(). And if you augmented, then you'd need to specify texImage2D(canvas.getData(), canvas.getColorSpace()). And when you would specify that, maybe it would be specified "it should produce the correct result, not the corrupted color result".

However, I do see that following should work consistently:

texImage2D(SRGB8_ALPHA8, canvas)
texImage2D(SRGB8_ALPHA8, image)
texImage2D(SRGB8_ALPHA8, video)
vs
texImage2D(RGBA8, canvas)
texImage2D(RGBA8, image)
texImage2D(RGBA8, video)
vs
texImage2D(RGBAF32, canvas)
texImage2D(RGBAF32, image)
texImage2D(RGBAF32, video)

I just don't immediately see the location of the specification nor the use-case for one of these be darker than the source and the two of them as specified in the source.

If I understand correctly, none of these operate on "raw values" of anything. They operate on the idealised contents.

The other way to think about it: How is it possible to produce a video file that is correctly texImage2D'd to a SRGB texture? If I understand correctly, the procedure is that you produce the video file as normal, but in a variant which corrupts the colours to too bright. If you view that video in any normal fashion, the way the video format specifies it to be viewed, the colours are wrong?

In other words: the video file says "in this video, the colours are in colourspace X". However, when interpreted in this colourspace, the colours are actually off. E.g. your file lies. Only when you uncorrupt the colours via SRGB texture fetch, do you get the correct values. If the video format would have a header field "the colours are corrupted for WebGL SRGB use cases", then the decoder could interpret that, use it to adjust the colours and we'd be able to display it correctly. At which point we would be in the same situation, where now RGB uploads were correct and SRGB uploads would again be darker.

What's the use case to get wrong colours from texImage2D upload? (I'm probably missing a lot of knowledge from video authoring -> webgl app rendering color management chain)

Contrast with:

I'd imagine the use-case for the "upload to SRGB matches uploads to RGB visually" is the added precision where humans typically need it. E.g. uploading a grayscale gradient you would get roughly similar color gradient in both cases. The difference is that with SRGB you get more distinct color variation in the gradient at the darker part, as SRGB has more bits to spare there.

lexaknyazev commented 2 years ago

Even for the canvas/image/video cases, the texImage functions accept GL's format and type that define the uploaded texture data from the GL perspective.

So the missing part in the WebGL spec is how to convert arbitrary canvas/image/video data into unorm8/rgb(a) (and all other valid combinations). The internal format thing should not be a major concern as it's well defined once the format and type are set.

kdashg commented 2 years ago

Our texImage(..., unpackFormat, unpackType, image) entrypoints have always been kinda strange, exactly because of what @lexaknyazev says: We need to define how to convert e.g. the image to the unpackFormat+unpackType, before we can let GL's specification take over.

Currently there are effectively two phases to uploading from e.g. images:

  1. Conversion of supplied e.g. image to unpackFormat+unpackType.
  2. GL's specified interpretation of unpackFormat+unpackType -> internalFormat uploading.

For phase 1, we don't pay attention to internalFormat at all. Here's the spec text:

Next, the source image data is converted to the data type and format specified by the format and type arguments, and then transferred to the WebGL implementation. Format conversion is performed according to the following table. If a packed pixel format is specified which would imply loss of bits of precision from the image data, this loss of precision must occur

Functionally, this means that we should be uploading the same bytes to GL for both:

One ground truth is that uploading to SRGB/RGB/UNSIGNED_BYTE uploads 0.5 * 255 such that it decodes to 0.2 when fetched. GL just doesn't have any facility on upload to convert to or from srgb. It only lets you pick "decode my 0.5 as 0.5, or instead as 0.2". This is where my comment about "GL deals in raw bytes" comes from.

We did recently add support for colorspace conversion to texImage:

First, the source image data is conceptually converted to the color space specified by the unpackColorSpace attribute [...]

However, this happens before conversion+truncation to unpackFormat+unpackType, at least today.


IMO the major reason we don't have a getVideoData is because it can already be done via composition of existing functionality:

function getVideoData(v) {
  const c2d = (new HTMLCanvasElement()).getContext('2d');
  const w = c2d.canvas.width = v.videoWidth;
  const h = c2d.canvas.height = v.videoHeight;
  c2d.drawImage(v, 0, 0);
  const idata = c2d.getImageData(0, 0, w, h);
  return idata;
}

All of these calls presently, for legacy reasons, imply that all color spaces are (non-linear) sRGB.


It sounds like one possible answer here is adding a distinct texImage(target, level, internalFormat, image) upload that does the conversion such that numerically 0.2 in the source maps to 0.2 on decode. This is something I do want anyways, but haven't pushed for. This would tell us, "I don't care how you do it, but upload this image/video/canvas into a texture of type internalFormat". I think this would be valuable.

IMO the only reasons not to add this would be:

I kinda want to take one last stab at external samplers for this reason.

kdashg commented 2 years ago

Specifically, you'd also want your image/video/canvas to say "my colorspace is srgb-linear" (or IDK maybe rec709-linear?). Today such authors would yes, have to lie about their colorspace, just like authors must do when uploading e.g. normal maps.

MarkCallow commented 2 years ago

One thing that has to be kept in mind for any “ground truth” experiments is that, absent an implementation of the canvas color space proposal, WebGL provides an sRGB drawing buffer that masquerades as linear. The drawing buffer is sRGB because it is typically composited as is and presented on an sRGB display. But WebGL says it is linear and any fragment shader outputs are written to it as is. Your fragment shader needs to do sRGB encoding to have any chance of correct colors. Otherwise that sRGB texture will be decoded to linear on sampling, written as linear to the drawing buffer.

Regards

-Mark
kdashg commented 2 years ago

This is why I don't like "linear" as a term. I don't really agree with "WebGL provides an sRGB drawing buffer that masquerades as linear", but it's probably a difference in word choice rather than a disagreement? I think it's clearer to say that webgl operates on perceptually-linear (rather than physically-linear) values, within the srgb colorspace, and that all arithmetic is done naively-mathematically-linearly between values.

A framing I like is to say that WebGL is agnostic to colors, that it's pure math. Naturally the math is done "linearly", but it is not per se "linear" as a color-person would use the term.

The way things are sent to display is usually as a perceptually-linear encoding, so any physically-linear values get lossily quantized, but that's ok because the main value of physically-linear textures and framebuffers is better dark precision for texture-fetched shader inputs and blending respectively. (Quantizing after all blending is complete is fine)

kkinnunen-apple commented 2 years ago

Thanks for the clarification!