samplePositions possibly incorrect for single-plane 4:2:2 YUV formats

MarkCallow commented 5 months ago

In https://github.com/KhronosGroup/KTX-Software/pull/864#discussion_r1528458923, @aqnuep wrote that currently for the single-plane 4:2:2 YUV formats (YUYV) we set the sample positions to (Y: 64, 128, 0, 0; U: 64, 128, 0, 0; Y: 192, 128, 0, 0; V: 64, 128, 0, 0). @fluppeteer gave the following response suggesting perhaps this choice is wrong. Further discussion is needed.

For a 422 format, Y should be 0, and one of the X values should be 0 - the indexing should be happening relative to the minimum coordinate of any sample (which I didn't explicitly say, I suspect). If you're doing a texture lookup that places the sample location at a half-pixel offset relative to the coordinate space (which is common), that's a separate transformation, although it would be helpful to have made that explicit in the introduction somewhere. I'll try to get that in with the next update.

The Y(/G) samples should be at X = 0 and 1. The Cb and CR (/B and R) samples will likely be either at X = 0 or X = 0.5, depending on the representation (for 4:2:0 formats, typical JPEG and MPEG disagree, for example); Vulkan doesn't embed that in the format, it's in the samplerYCbCrConversion, so you may have to pick a representation that may be inaccurate if the format is the only thing you have access to.

I'll look as soon as I've had lunch to see whether that 0.5 offset you mentioned is still there and what's causing it.

What I'm referring to here is running on a big-endian architecture, at which point you have multiple samples representing each channel (in order to handle the non-contiguous bits when seen in little-endian canonical order). If you had a big-endian 16-bit-per-channel YUV422 format and looked at it on a little-endian machine (or used the nominally canonical little-endian representation) then you'd have 8 samples, with two each for each channel, so a total of four covering the two Y samples. Is anyone actually supporting this? Questionable. I suspect there are more urgent things to fix.

Read the rest of https://github.com/KhronosGroup/KTX-Software/pull/864#discussion_r1528458923 for more details.

This may turn out to be a specification issue.

aqnuep commented 5 months ago

I'll look as soon as I've had lunch to see whether that 0.5 offset you mentioned is still there and what's causing it.

It's simply what we used when we added support for the 4:2:2 formats, as previously they weren't supported at all. So changing it should be as trivial as changing the sample positions generated based on the VkFormat.

What I'm referring to here is running on a big-endian architecture, at which point you have multiple samples representing each channel (in order to handle the non-contiguous bits when seen in little-endian canonical order). If you had a big-endian 16-bit-per-channel YUV422 format and looked at it on a little-endian machine (or used the nominally canonical little-endian representation) then you'd have 8 samples, with two each for each channel, so a total of four covering the two Y samples. Is anyone actually supporting this? Questionable. I suspect there are more urgent things to fix.

I don't follow this part, I'm not sure why would endianness be an issue with 4:2:2 formats (there's the mPACKn format issue, but that's more or less resolved, although we do not have test environments / CI for big-endian systems).

fluppeteer commented 5 months ago

I'll look as soon as I've had lunch to see whether that 0.5 offset you mentioned is still there and what's causing it.

It's simply what we used when we added support for the 4:2:2 formats, as previously they weren't supported at all. So changing it should be as trivial as changing the sample positions generated based on the VkFormat.

Yup; I found the code, and think the samples should be at 0/64/128 for consistency. But the general question raised is still valid if you wanted (for consistency) a format that had a minimum sample value at a half-index offset (e.g.). Thinking.

What I'm referring to here is running on a big-endian architecture, at which point you have multiple samples representing each channel (in order to handle the non-contiguous bits when seen in little-endian canonical order). If you had a big-endian 16-bit-per-channel YUV422 format and looked at it on a little-endian machine (or used the nominally canonical little-endian representation) then you'd have 8 samples, with two each for each channel, so a total of four covering the two Y samples. Is anyone actually supporting this? Questionable. I suspect there are more urgent things to fix.

I don't follow this part, I'm not sure why would endianness be an issue with 4:2:2 formats (there's the mPACKn format issue, but that's more or less resolved, although we do not have test environments / CI for big-endian systems).

KDFS assumes the bits of a sample are contiguous in little-endian order (or strictly in native order if that's easier for an implementation, but assuming little-endian for portable DFDs). If you have a 16-bit quantity stored in big-endian order, you have to do that in two samples, one at an 8-bit offset and the other at a 0-bit offset, with the same location.

The issue here was that KTX tools assumed that the same channel appearing in two samples was used to indicate a downsampled format with more than one luma value. It could equally be a 4:4:4 format with a single luma value stored big-endian, and a big-endian format with 4:2:2 downsampling would have four luma samples, two each at different sample locations (and shouldn't be confused with 4:2:0, therefore). The cases can be disambiguated by checking whether the locations of the samples are the same as well as whether the channels match - but that's more of a pain to code, and I've no reason to think there's any priority in doing this, so I mentioned it only for reference; I had no problem with the existing check being used.

aqnuep commented 5 months ago

KDFS assumes the bits of a sample are contiguous in little-endian order (or strictly in native order if that's easier for an implementation, but assuming little-endian for portable DFDs). If you have a 16-bit quantity stored in big-endian order, you have to do that in two samples, one at an 8-bit offset and the other at a 0-bit offset, with the same location.

Ah, yeah. I understand what you mean, but I don't think this is an issue in practice because the DFD in KTX 2.0 files is about the format the data is in the file itself for which the KTX 2.0 spec defines that it is in little-endian layout. So endianness only becomes an issue once the data is in memory. Doesn't it?

fluppeteer commented 5 months ago

KDFS assumes the bits of a sample are contiguous in little-endian order (or strictly in native order if that's easier for an implementation, but assuming little-endian for portable DFDs). If you have a 16-bit quantity stored in big-endian order, you have to do that in two samples, one at an 8-bit offset and the other at a 0-bit offset, with the same location.

Ah, yeah. I understand what you mean, but I don't think this is an issue in practice because the DFD in KTX 2.0 files is about the format the data is in the file itself for which the KTX 2.0 spec defines that it is in little-endian layout. So endianness only becomes an issue once the data is in memory. Doesn't it?

Yup - I'm sure it's fine for KTX, and for 99% of implementations (I've no recollection whether anyone actually has a big-endian Vulkan implementation). But technically DFDutils is its own thing, so I was just noting it for completeness. And, apparently, to spread confusion. :-)

KhronosGroup / KTX-Software

samplePositions possibly incorrect for single-plane 4:2:2 YUV formats #872