ARM-software / astc-encoder

The Arm ASTC Encoder, a compressor for the Adaptive Scalable Texture Compression data format.
https://developer.arm.com/graphics
Apache License 2.0
1.06k stars 237 forks source link

Provide a library call to query block properties and metrics #163

Closed alecazam closed 3 years ago

alecazam commented 4 years ago

ASTC didn't provide any bits in the format to determine HDR vs. LDR content, so I think this involves a more expensive walk of the blocks to identify any HDR bits that are set. Metal has HDR formats for ASTC, and I believe Vulkan does too, but GL/ES/WebGL/KTX doesn't have that. It's too easy to pass ASTC4x4_HDR to hardware that only supports LDR, and the content then has to be decoded to 999E5 or RGBA16f.

A similar walk is needed for 3D blocks vs. 2D block format. iOS cannot use 3D blocks, but again there's no way to identify this from the format except by walking the block content. Passing a 3D texture compressed to 3D blocks to iOS hardware that doesn't support it again needs a routine to validate it, unpack the 3D blocks and re-encode them to 2D blocks or completely unpack them. Is there any hardware that supports 3D blocks, or can this just be deprecated from the spec?

alecazam commented 4 years ago

Here are more details on all the different variants, and support should be provided for conversion for slice 3D support. And conversion when HDR is not supports (<A13, < iPhone11 and about to be released iPads).

https://github.com/KhronosGroup/Vulkan-Docs/issues/926

solidpixel commented 4 years ago

What's the actual request for the encoder?

Tracking asset metadata is generally a layer above what we provide here, as games already have to track a pile of state that we don't provide by compressing single surfaces (mipmaps, texture arrays, cubemaps, material data, etc). The general assumption is that the ASTC mode and block size is apriori knowledge that the layer above the compressor can manage itself, given it set the options on the command line / compression API it should know what the settings are.

Disagree that 3D blocks need a walk - they have a different format enum to the 2D blocks, so are really no different to distinguishing two different 2D block sizes. If you use the Vulkan enums you can distinguish color format too.

Support should be provided for conversion for slice 3D support.

Compression of topological structures (mipmaps, texture arrays, cubemaps) is currently a layer above what we provide here encoder. I'd like to support them "at some point", but I'll be honest and say it's not very far up the priority list at the moment.

Major game engines use their own packing formats anyway, and the main user request is "more speed please", so that's getting the priority at the moment.

And conversion when HDR is not supported

I'm not going to do HDR conversion in the encoder. You can't just clip - it looks terrible - and there is no way we could support an "artistically useful" set of tone mapping options on the command line. That's going to have to stay an application problem to solve.

alecazam commented 4 years ago

What's the actual request for the encoder?

This code is the definitive reference for ASTC, so if you didn't know the specifics from the GL type (all that KTX stores), then you'd need some utilities to walk the blocks. I'll have to check if those GL extensions add HDR/3D block GL types, and if so then derivation from the block data wouldn't be needed. My plan was to include the Vulkan and Metal type in the name-value pairs of the KTX. KTX2 seems to be storing the Vuikan type, since it's a new spec, but there are no viewers for that.

Disagree that 3D blocks need a walk

I just saw that HDR and 3D have their own Vulkan types. So that's great.

Compression of topological structures (mipmaps, texture arrays, cubemaps) is currently a layer above what we provide here encoder.

Yes, I have all that. Since astcenc doesn't handle any of these types or mipmaps. There's also the encoder "cuttlefish" that calls the astencoder.

Major game engines use their own packing formats anyway, and the main user request is "more speed please", so that's getting the priority at the moment.

Agreed. I think the timings on the new code are faster, but I think making it fast enough to run on ARM chips would be nice. BC encoders are under 1ms, and ASTC is around 1s. Moving to GPU encoders (or transcoders) is also important. Handling any ASTC or BC6/7 on the GPU is complex and often results in a subset of the spec.

I'm not going to do HDR conversion in the encoder.

True. I think HDR conversion could be handled by the existing ASTC decoder. Then callers could decide 999E5 or 16f. This seems to be what the KTX2 encoders are doing, and Unity as well..

solidpixel commented 4 years ago

I'll have to check if those GL extensions add HDR/3D block GL types, and if so then derivation from the block data wouldn't be needed.

GLES has separate enums for "linear" vs "sRGB", and all of the block sizes which gives "2D/3D". It can't distinguish between LDR and HDR based on the enum alone - they are both linear types.

I guess my point here is that you should never need to do this based on block introspection - the application should know its assets already at compression time, it just needs to store it somewhere as a side-channel. FWIW KTX allows custom property annotation if you want to use an existing container format.

I think HDR conversion could be handled by the existing ASTC decoder.

Emitting decompressed images in alternate HDR formats (i.e. not bouncing though fp16 first) is the mirror half of the input API change you have requested. Makes sense, although solving the input formats first is higher priority because that impacts compression performance.

Converting LDR to HDR - no problem, it's just a range extension, although seems to be a waste of memory. Converting HDR to LDR - will probably just get rejected by the API.

alecazam commented 4 years ago

One other thing I found useful was writing an ASTC block analyzer to count void extent blocks (I've heard some hw has trouble with srgb and these), and dual-plane usage. You'd mentioned the RBA+G dual plane mode in another thread which is cool for mapping to BC5/ETCrg11 althought it wastes bits storing B. It's just hard to know when an encoder has actually used these optimizations or not. So another encoder may be faster, but generates really poor blocks.

Now maybe that all works out in the RSME results, but having more ability to see the types of block data generated seems like useful data. Similar utility is needed for ETC and BC block encoders. I know it's not an encoder thing, but more of ASTC analysis tool that could be a part of the codebase. For example, ISPC generates not great ASTC blocks but it's fast, and ATE generates reasonable ASTC4x4 and 8x8 and is about 20x faster than astcenc in the past, but no sources outside Apple. I'll have better numbers on that.

I've always called astcenc via the command line, but this is my first time exploring the source code. Astcenc is the only game in town for HDR encoding. It's also very nice that this is all open-source. And thanks Pete for all the support on these issues, and hard work on the encoder.

solidpixel commented 4 years ago

You'd mentioned the RBA+G dual plane mode in another thread which is cool for mapping to BC5/ETCrg11 although it wastes bits storing B.

I still recommend trying to store two channel data as (RGB) + A - the shader sampling swizzle is different to BC5 and ETCrg (i.e. you sample from .ga in the shader, not .rg) - but it frees up so much bitrate because you only need to store two endpoint colors (luminance + alpha).

One other thing I found useful was writing an ASTC block analyzer.

Yes, I suspect most people have one lying around =) It's a good idea to have an official one - I'll see what I can do to clean up some of the ones I have.

And thanks Pete for all the support on these issues, and hard work on the encoder.

Happy to help - sorry for the slow progress on some of the issues - it's a somewhat resource constrained problem.

alecazam commented 4 years ago

I still recommend trying to store two channel data as (RGB) + A -

Yes, I like GGG R for L+A, but then your hw and OS and API need to support swizzles, or you have to do them with conditionals in the shaders. BC5 and ETCfg11 go to rg01, so ASTC is the odd one out here. Same issue with BC4/ETCr11 which produced r001 vs. ASTC which wants RRR1 for L compression. I'd kind of argue that ASTC's swizzle pattern is better when going to shaders that formerly took full RGBA. Downside is looking at L+A instead of rg01 in an editor isn't great. I'd like to fix that with smarter texture viewers/editors. It looked like ARM had updated the pages on how to best use ASTC, but I had to learn all this from discussions with you back a few years ago.

Yes, I suspect most people have one lying around =)

I wrote a good one, but it's at my last employer. So I haven't felt like writing one again.

I've been working on an encoder frontend to astcenc, and I like using KTX as an input too. Then I can feed custom mipchains, and also feed in HDR data. KTX is way simpler to bring in than TIFF or EXR or DDS or PVR and it's one of the few lossless containers. KTX even with block encoding is big esp for RGBA16f/32F, but KTX2 adds supercompression, so then KTX inputs would be on okay for source control and transfers. I'd just like to get input content to stop being the top-mip of a PNG, and have more editors generate KTX or KTX2 for source and output.

solidpixel commented 4 years ago

but then your hw and OS and API need to support swizzles

Is WebGL the only odd-one out here? All of the hardware has supported one or more of native GL/GLES/Vulkan, so I'd assume the underlying hardware can do this, it's just an API problem.

alecazam commented 4 years ago

Is WebGL the only odd-one out here?

WebGL1/2 just left them out of the API, but at the time they may not have been exposed in all API surfaces. I think Intel and other Tier1 Metal macOS cards lack swizzle support. Then they were only exposed in Metal iOS 13.0 and macOS 15.0, so targeting older OS's you still don't have access. I don't know about Vulkan support on Android, but the API definitely exposes the feature.

It's just a single conditional in this case, and worth the shader conditional or variant for the better compression on ASTC and BC5/ETCrg11. PBR has a lot of 1, 2 and 2nm channel textures. I still am disappointed that now with HDR constants, that ASTC spec left out L+A HDR mode, but ETCrg11 might suffice there. BC5/ETCrg11 are supposed to be stored as 16fx2 in the texture samplers. Hopefully new titles can assume texture + post-swizzle held in KTX/KTX2 metadata, and continue pre-swizzle done in the encoder/mip generation. So that discrepency solved.

ASTC is also only unorm, and the others have snorm forms which have a true 0 value in the -1 to 1 conversion done in the sampler, where unorm is 128/255. Unless BC4/ETCrg11 is encoded as unorm, then a n.xy * 2-1 is still needed in the shader to handle most ASTC 1s/2s/2nm data. This is unfortunate that the content can't be switched out without adjustments, since normal and SDF data is signed and not unorm.

alecazam commented 3 years ago

For a simple 8-bit normal map, I'm finding astcenc doesn't use dual-plane. It also encodes some of the blocks as FMT_RGBA instead of FMT_LUMINANCE_ALPHA. The reason seems to be floating point error introduced in all the math the encoder does. I don't even see rounding on some of the numerical conversions. I'm feeding in all data as floating point rgba/255.0f directly to the encoder, but the existing U8 and U16 paths did the exact same math.

This is the test normal map (taken from some old Id compression papers).

collectorbarrel-n

And a screenshot from xcode of the endpoint values not quite being snapped.
image

solidpixel commented 3 years ago

Thanks for the test image. What settings structure are you passing into the compressor? Channel swizzle in particular - I'd expect the replicated e.g. RRR luminance value to round the same way across all three channels.

alecazam commented 3 years ago

I do the swizzles in my own setup, so the default swizzle is used. I weight ra as 1 ga as 0. I should be posting this code in the next few weeks, but am trying to make sure that all the encoders that I call generate reasonable data and blocks. I don't get dual-plane on any of these blocks, so that had me concerned. Is LA by default a dual-plane format?

solidpixel commented 3 years ago

Split this one off as #172

solidpixel commented 3 years ago

I've added a rich block metadata query in the above commit. This probably does more than this original request asked for (it returns pretty much the entire block configuration after symbolic unpacking), but is functionality I need for codec development.

The code can be used as an example showing users how to query e.g. HDR-ness. Minimizing the implementation just to make a specific property query is left as an exercise for the reader ...