alpha decoding bug when LDR_SRGB is enabled in ASTCENC_PRF_LDR_SRGB profile

richgel999 commented 11 months ago

Hi, I've implemented my own full ASTC decoder, following the Khronos spec as closely as possible: https://registry.khronos.org/DataFormat/specs/1.1/dataformat.1.1.pdf

I've found that when decoding to ASTCENC_TYPE_U8 with the LDR_SRGB profile enabled, the alpha channel differs (typically by 1) from a decoder that follows the ASTC spec. The problem boils down to what 16-bit value is composed before interpolation on the alpha channel (component index 3): does the alpha channel's low byte get set to 0x80, or to the high bytes? The spec is clear:

"If sRGB conversion is not enabled, or for the alpha channel in any case, C0 and C1 are first expanded to 16 bits by bit replication:"

One of Google's Android decoders matches your decoder's output, but it also doesn't match the spec. The relevant code in their decoder: https://chromium.googlesource.com/external/deqp/+/refs/heads/master/framework/common/tcuAstcUtil.cpp#1453

According to the ASTC specification, this code (which matches the output of ARM's decoder) is incorrect on channel 3 (alpha).

See section 18.19 (Weight Application) here: https://registry.khronos.org/DataFormat/specs/1.1/dataformat.1.1.html#astc_weight_application

The spec:

I'm now wondering what is correct: the spec, or the ARM/Google software reference decoders?

Thanks.

richgel999 commented 11 months ago

Another author of an ASTC compute shader decoder seems to have encountered the same issue: https://themaister.net/blog/2021/08/29/compressed-gpu-texture-formats-a-review-and-compute-shader-decoders-part-3-3/

"Inconsistent decode formats - ASTC decoders will either decode to UNORM8 or FP16, depending on the profile supported by the GPU. This is kinda annoying, especially when we consider formats like SRGB8 where apparently the RGB portion is supposed to decode to UNORM8 viewed as SRGB, but alpha is still FP16? (._.) Not even the reference decoder seems to get that right, so I just decode alpha as UNORM8 in that case."

I've also encountered another bug (i.e. key differences from the spec) comparing astc-encoder's decoder when decoding to 8-bit in LDR mode, but I'll wait until we can resolve this alpha issue before filing this one. The spec says to return the high byte ("If sRGB conversion is not enabled and the decoding mode is decode_unorm8, then the top 8 bits of the interpolation result for the R, G, B and A channels are used as the final result."), but astc-encoder's decoder does something significantly more complex in decode_texel() (converts to 16-bit half float with an implicit divide by 65535.0, then back to float, then muls by 255, then adds .5 and converts to int). This is also causing off by 1 mismatches between Google, astc-enc's, and my decoder.

If necessary I can provide a repos of these bugs. It would be great if we could narrow down exactly how ASTC is supposed to be decoded, so encoders can make the best encoding decisions (that lead to lowest error). It should be possible for all the decoders to be bitwise exact, no matter what mode they are in.

Thanks again.

richgel999 commented 11 months ago

This compute shader decoder, which MESA recently ported and integrated, also has a comment about the alpha sRGB issue: https://github.com/Themaister/Granite/blob/master/assets/shaders/decode/astc.comp#L81

solidpixel commented 11 months ago

Hi Rich,

I suspect the main issue for the general 8-bit linear decodes is "bug by omission". The original astcenc code release pre-dates the release of the decode mode extensions, which came along 18 months later. At the time the code was written decoding to an 8-bit value wasn't a specified behaviour, so this was a decoder extension simply to generate usable images.

Agree it should be updated to match the decode mode extension behavior.

I'm now wondering what is correct: the spec, or the ARM/Google software reference decoders?

If in doubt, the spec. There were some late changes around the sRGB handling during standardization, so this may well be another area where the released encoder is effectively reflecting an earlier build of the development standard rather than the final ratified standard.

I'll do some digging around internally in the new year to confirm. If there is a difference I may need to bump this to a Khronos meeting to discuss to get agreement so a full answer may not be a fast one.

solidpixel commented 11 months ago

(EDITED for sake of clarity after more debugging in a standalone test bench)

@richgel999 Do you have an reproducer compressed image you can share with a failing texel coordinate?

Based on my current debugging in a standalone test bench I can confirm that the FP16 alpha value for sRGB is incorrect.

With the current code I can't see any errors in the sRGB decode_u8 output values written by store_image. The values round-tripped via the float and then rounded out in store_image seem to match the top 8 bits of RGB and A for all ep0/ep1/weight combinations.

With the current code I do see errors in the Linear decode_u8, so that needs fixing.

solidpixel commented 11 months ago

Preliminary attempt to support both correct sRGB image alpha output and decode_unorm8 rounding:

PR #448

richgel999 commented 11 months ago

Do you have an reproducer compressed image you can share with a failing texel coordinate?

I'll work on a small standalone repo for you with 2 decompressors (mine and Google's). I found the problem via exhaustive fuzzing with random 128-bit blocks (immediately rejecting any obviously invalid blocks). I decompress the blocks with all 3 decompressors and compare their outputs. When it's up on github I'll reply here.

richgel999 commented 11 months ago

I'm working on a stand-alone test program. In the mean time, here's a simple ASTC 12x12 example block showing the decode problem. (Note this is for plain 8-bit LDR decoding only, not sRGB decoding.)

{ 0x84,0x0,0x38,0xC8,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0xB3,0x4D,0x78 }

ASTCENC_DECOMPRESS_ONLY
astcenc_config_init(ASTCENC_PRF_LDR, 12, 12, 1, ASTCENC_PRE_MEDIUM, ASTCENC_FLG_DECOMPRESS_ONLY, &config);
astcenc_decompress_image() was called on a 12x12 ASTCENC_TYPE_U8 astcenc_image

0: blk: 12x12, cems: 0 0 0 0, grid: 2x12, dp: 0, parts: 1, endpoint range: 20, endpoint levels: 256, weight range: 0, weight levels: 2

{ 0x84,0x0,0x38,0xC8,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0xB3,0x4D,0x78 }

Pixel compare failure 10x0 comp 0, Google: 95, mine: 95, ARM: 96
Pixel compare failure 10x0 comp 1, Google: 95, mine: 95, ARM: 96
Pixel compare failure 10x0 comp 2, Google: 95, mine: 95, ARM: 96
Pixel compare failure 1x2 comp 0, Google: 95, mine: 95, ARM: 96
Pixel compare failure 1x2 comp 1, Google: 95, mine: 95, ARM: 96
Pixel compare failure 1x2 comp 2, Google: 95, mine: 95, ARM: 96
Pixel compare failure 10x4 comp 0, Google: 95, mine: 95, ARM: 96
Pixel compare failure 10x4 comp 1, Google: 95, mine: 95, ARM: 96
Pixel compare failure 10x4 comp 2, Google: 95, mine: 95, ARM: 96
Pixel compare failure 10x7 comp 0, Google: 95, mine: 95, ARM: 96
Pixel compare failure 10x7 comp 1, Google: 95, mine: 95, ARM: 96
Pixel compare failure 10x7 comp 2, Google: 95, mine: 95, ARM: 96
Pixel compare failure 1x8 comp 0, Google: 95, mine: 95, ARM: 96
Pixel compare failure 1x8 comp 1, Google: 95, mine: 95, ARM: 96
Pixel compare failure 1x8 comp 2, Google: 95, mine: 95, ARM: 96

richgel999 commented 11 months ago

One more example block. If I use abs(Google_or_my_decoder-ARM_decoder) <= 1, the test runs fine (but of course this is very undesirable). (Like before, this is for plain 8-bit LDR decoding only, not sRGB decoding.)

1: blk: 12x12, cems: 0 0 0 0, grid: 3x8, dp: 0, parts: 1, endpoint range: 20, endpoint levels: 256, weight range: 0, weight levels: 2
Physical ASTC block bytes: { 0x29,0x0,0x1A,0x97,0x1,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0xCF,0x97,0x86 }
Pixel compare failure 1x0 comp 0, Google: 192, mine: 192, ARM: 191
Pixel compare failure 1x0 comp 1, Google: 192, mine: 192, ARM: 191
Pixel compare failure 1x0 comp 2, Google: 192, mine: 192, ARM: 191
Pixel compare failure 4x0 comp 0, Google: 157, mine: 157, ARM: 156
Pixel compare failure 4x0 comp 1, Google: 157, mine: 157, ARM: 156
Pixel compare failure 4x0 comp 2, Google: 157, mine: 157, ARM: 156
Pixel compare failure 1x1 comp 0, Google: 161, mine: 161, ARM: 160
Pixel compare failure 1x1 comp 1, Google: 161, mine: 161, ARM: 160
Pixel compare failure 1x1 comp 2, Google: 161, mine: 161, ARM: 160

richgel999 commented 11 months ago

I've prepared a special test repo for you here: https://github.com/richgel999/astc_ldr_decode_bug

It's a very stripped down subset of the Basis Universal repo, just for this test, with most files removed from the build. Instructions for building the repo and running it under Windows (Visual Studio 2022) or under Linux (using cmake/make) are in the repo. There are no external dependencies, and it should be trivial to build & run.

basisu_tool.cpp contains a small test which iterates through a large number of ASTC configurations. For each valid config, it creates ASTC blocks with random endpoints and weights, then packs the block to ASTC, then unpacks the block using my decoder (in astc_helpers.h), Android's (the decoder used by the "drawElements Quality Program Tester Core", which I've named encoder/basisu_astc_decomp.cpp), and the decoder built-in to astc-encoder.

In HDR mode (when decoding to either half float or float) all 3 decoders return the same results during fuzz testing. The issue is decoding to LDR (8-bit/component) pixels, with or without sRGB enabled. (To test in this mode, set hdr_cems_only to true, and ldr_cems_only to false.)

After decoding to uint8_t pixels (to LDR 8-bit), it compares the pixel components of the 3 decoded blocks of pixels and prints any differences:

// Compare the outputs
for (uint32_t y = 0; y < block_h; y++)
{
    for (uint32_t x = 0; x < block_w; x++)
    {
        for (uint32_t c = 0; c < 4; c++)
            {
                if ((pixels[(x + y * block_w) * 4 + c] != alt_pixels[(x + y * block_w) * 4 + c]) || 
                //(iabs(pixels[(x + y * block_w) * 4 + c] - arm_pixels[(x + y * block_w) * 4 + c]) > 1) 
                (pixels[(x + y * block_w) * 4 + c] != arm_pixels[(x + y * block_w) * 4 + c])
                   )
                {
                    printf("Pixel compare failure %ux%u comp %u, Google: %u, mine: %u, ARM: %u\n", x, y, c,
                    pixels[(x + y * block_w) * 4 + c],
                    alt_pixels[(x + y * block_w) * 4 + c],
                    arm_pixels[(x + y * block_w) * 4 + c]);
                }
            }
        }
}

At the top of the basisu_tool.cpp file is the macro TEST_LDR_SRGB_DECODE -- if set to 1, it instead decodes in sRGB mode. In this case, the version of astc-enc I'm using differs from the spec for alpha, but it doesn't seem to differ for RGB.

Running this under Windows or Linux this is what it prints initially:

0: blk: 12x12, cems: 0 0 0 0, grid: 2x12, dp: 0, parts: 1, endpoint range: 20, endpoint levels: 256, weight range: 0, weight levels: 2
Physical ASTC block bytes: { 0x84,0x0,0x38,0xC8,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0xB3,0x4D,0x78 }
Pixel compare failure 10x0 comp 0, Google: 95, mine: 95, ARM: 96
Pixel compare failure 10x0 comp 1, Google: 95, mine: 95, ARM: 96
Pixel compare failure 10x0 comp 2, Google: 95, mine: 95, ARM: 96
Pixel compare failure 1x2 comp 0, Google: 95, mine: 95, ARM: 96
Pixel compare failure 1x2 comp 1, Google: 95, mine: 95, ARM: 96
Pixel compare failure 1x2 comp 2, Google: 95, mine: 95, ARM: 96
Pixel compare failure 10x4 comp 0, Google: 95, mine: 95, ARM: 96
Pixel compare failure 10x4 comp 1, Google: 95, mine: 95, ARM: 96
Pixel compare failure 10x4 comp 2, Google: 95, mine: 95, ARM: 96
Pixel compare failure 10x7 comp 0, Google: 95, mine: 95, ARM: 96
Pixel compare failure 10x7 comp 1, Google: 95, mine: 95, ARM: 96
Pixel compare failure 10x7 comp 2, Google: 95, mine: 95, ARM: 96
Pixel compare failure 1x8 comp 0, Google: 95, mine: 95, ARM: 96
Pixel compare failure 1x8 comp 1, Google: 95, mine: 95, ARM: 96
Pixel compare failure 1x8 comp 2, Google: 95, mine: 95, ARM: 96
1: blk: 12x12, cems: 0 0 0 0, grid: 3x8, dp: 0, parts: 1, endpoint range: 20, endpoint levels: 256, weight range: 0, weight levels: 2
Physical ASTC block bytes: { 0x29,0x0,0x1A,0x97,0x1,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0xCF,0x97,0x86 }
Pixel compare failure 1x0 comp 0, Google: 192, mine: 192, ARM: 191
Pixel compare failure 1x0 comp 1, Google: 192, mine: 192, ARM: 191
Pixel compare failure 1x0 comp 2, Google: 192, mine: 192, ARM: 191
Pixel compare failure 4x0 comp 0, Google: 157, mine: 157, ARM: 156

Thank you, -Rich

solidpixel commented 11 months ago

Super, thanks - will take a look.

P

solidpixel commented 11 months ago

this is for plain 8-bit LDR decoding only, not sRGB decoding

OK, that's good. I can definitely reproduce non-sRGB decode_unorm8 issues on the astcenc main branch, and computing a few cases by hand definitely shows that astcenc is out-of-spec at the moment.

The PR I linked above seems to have solved the issue for the two specific failure blocks you give in the comments above so I'm hopeful that this should be much improved.

solidpixel commented 11 months ago

Updating your fuzzer to use astcenc from that PR does indeed seemed to have solved it.

Tested all the square block sizes, for linear and sRGB, and they all passed.

richgel999 commented 11 months ago

Tested all the square block sizes, for linear and sRGB, and they all passed.

Ok - I'll grab it and try it here. Thank you so much for addressing this, it really helps. ASTC is a very complex format so any divergences from the spec can result in much confusion.

solidpixel commented 11 months ago

No probs - thanks for the bug report.

richgel999 commented 11 months ago

Your repo above fixes both problems in my test repo. Now all 3 decoders return the same LDR/LDR (sSRGB)/HDR pixels. Thanks!

solidpixel commented 11 months ago

@richgel999 I've added one more fix to ensure that UNORM16 void extent constant color blocks decode the right way. They had the same issue - they would round correctly for sRGB, but were not for linear LDR when using UNORM8.

richgel999 commented 11 months ago

@richgel999 I've added one more fix to ensure that UNORM16 void extent constant color blocks decode the right way. They had the same issue - they would round correctly for sRGB, but were not for linear LDR when using UNORM8.

Got it - the fuzzer I sent didn't test void extent blocks.

solidpixel commented 11 months ago

Catching up in the office: it is believed that the fact that sRGB alpha is not decode_unorm8 is actually a spec bug. There is a pending merge request to update the sRGB alpha behavior in the Khronos Data Format Spec to return an 8-bit value inline with the RGB values, as this was the expected behavior and inline with the sRGB images being specified to ignore the decode mode extensions.

If you have Khronos gitlab access, the two relevant links are:

ARM-software / astc-encoder

alpha decoding bug when LDR_SRGB is enabled in ASTCENC_PRF_LDR_SRGB profile #447