KHR_materials_basisu and independent data channels

donmccurdy commented 4 years ago

Opening a new thread here, based on discussion around https://github.com/KhronosGroup/glTF/pull/1612.

Storing three independent data channels in compressed textures introduces significant artifacts with current approaches (see #1612). As a result, these two cases in the current glTF material specification require consideration:

occlusion/rough/metal (RGB) texture
three-channel normal (RGB) texture

These could be solved with:

(1) occlusion (R) and rough/metal (RG) textures, and
(2) two-channel normal (RG) texture

A general-purpose texture-swizzling extension would be an alternative to (1). Currently that is harder to support on the web, but it won't always be so, and might be worth considering today.

Assuming (1) and (2) are the correct solutions here, what are the right mechanisms to bring them into the format? A couple ideas:

(A) Introduce new extensions, e.g. KHR_packing_normal, KHR_packing_roughness_metallic, analogous to MSFT_packing_normalRoughnessMetallic.
(B) Include them with the upcoming "PBR Next" material model's extensions; do not backport to existing metal/rough, spec/gloss, or unlit materials.
(C) Include these changes, and perhaps other KTX2/Basis-related changes, in a glTF 2.1 release to simplify the perceived complexity, as compared to specifying multiple interdependent extensions.

MarkCallow commented 4 years ago

How would general-purpose texture-swizzling solve the occlusion/rough/metal case? In current compressed textures you can only have 2 independent channels: the RGB channel & alpha.

lexaknyazev commented 4 years ago

@donmccurdy Possible channel mappings from Basis to GPU formats https://github.com/KhronosGroup/KTX-Specification/issues/100#issuecomment-538029332

donmccurdy commented 4 years ago

@MarkCallow the format already allows occlusion to be separated from rough/metal, but even when they're separated rough/metal still points to the GB channels in the spec, the R channel isn't used. A texture swizzle extension would allow rough/metal to be read from different channels.

zeux commented 4 years ago

General-purpose swizzling doesn’t fully solve normal map storage - shader code is necessary to reconstruct the third component.

Two-component normal storage would be very welcome otherwise - high quality normal storage requires two separately compressed components, and games have been using DXT5 and, later, two-channel BC5, for >10 years now to solve this. However we should be careful wrt specification. On desktop the only widely supported format on the web that is suitable for high quality normals is DXT5 (BC3) but it requires storing one channel in G and one channel in A. If transcoding uses BC5 instead the mapping is RG. Not sure what the expected implementation strategy looks like.

How severe is the roughness-occlusion-metallic issue? Metallic is often binary and transition regions commonly have specular fringing due to interpolation even on uncompressed data. I have not tried to use BasisU extensively so don’t have a good intuition - would be good to have examples (uncompressed RGB, DXT1, BasisU transcoded DXT1).

lexaknyazev commented 4 years ago

I think the implementation strategy for the web is:

EAC RG on mobile (WEBGL_compressed_texture_etc)
BC5 on capable desktops (EXT_texture_compression_rgtc)
BC3 on less capable desktops (WEBGL_compressed_texture_s3tc)

Basis Universal can be transcoded to all these formats.

Metallic-Roughness question came from customers who (at some point) had to use PNG over JPEG because of perceived artifacts. Basis Universal codec is not optimized for encoding three uncorrelated channels in the same image, so we might need to be proactive here.

donmccurdy commented 4 years ago

General-purpose swizzling doesn’t fully solve normal map storage...

Agreed, it only solves the occlusion/rough/metal problem.

How severe is the roughness-occlusion-metallic issue?

Good question – I don't know. It's been reported to us but I do think we should try to quantify it a little, maybe with something like https://googlewebcomponents.github.io/model-viewer/test/fidelity/results-viewer.html, before doing anything too elaborate to avoid it.

zeux commented 4 years ago

I think the implementation strategy for the web is:

Right, but then do you have to generate permutations for all shaders that need to read this data because you don't know the component mapping, and WebGL doesn't support general swizzles? General swizzles could be simulated with dot products I suppose, e.g. sample = textureFetch(), r = dot(sample, controlVector1), g = dot(sample, ControlVector2).

FWIW when I said that the only widely suported format on web/desktop is DXT5 (BC3) I meant that even though theoretically an extension is available, in practice Chrome doesn't implement it and neither does Edge (not sure about Firefox/Safari).

lexaknyazev commented 4 years ago

permutations for all shaders

Well, that depends. EAC RG11 and BC5 are sampled from red and green; BC3 (DXT5) would use red and alpha. In my opinion, the former should be the default.

BC4 and BC5 formats are mandatory on all D3D10 and newer class hardware, so they should be widely available on desktops. The corresponding WebGL extension is available today in Firefox when using OpenGL. Hopefully, ANGLE will expose it soon as well. This would unlock Chromium-based browsers including Edge Insider.

ETC2/EAC formats are available today in Chrome and Firefox on Android with capable (ES 3.0+) hardware.

Safari will catch up someday...

donmccurdy commented 4 years ago

Any idea what percentage of desktop devices support EXT_texture_compression_rgtc? I was hoping to find that on https://webglstats.com/webgl/extension/WEBGL_compressed_texture_s3tc but didn't. :/

lexaknyazev commented 4 years ago

Oldest desktop hardware that support RGTC is about 12 years old:

AMD Radeon HD 2000
Intel GMA 4500
NVIDIA GeForce 8

zeux commented 4 years ago

Any idea what percentage of desktop devices support EXT_texture_compression_rgtc?

Almost all GPUs support the formats, but almost no browsers do. WebGL commonly works through ANGLE which, when targeting DX9, can't implement this extension because the format is DX10-only; I'm not sure what the status is for ANGLE-DX11.

MarkCallow commented 4 years ago

WebGL commonly works through ANGLE which, when targeting DX9, can't implement this extension because the format is DX10-only; I'm not sure what the status is for ANGLE-DX11.

This kind if thing is precisely why all compressed formats are expressed as extensions in WebGL. In other words just because ANGLE/DX9 can't support RGTC is no reason for browsers to not support it when running on DX10 or above. I am in communication with the author of webglstats.com to find out why it doesn't list EXT_texture_compression_rgtc among the extensions. I.e, is it because it no browser has ever reported it or because it doesn't know about it.

lexaknyazev commented 4 years ago

ANGLE support for RGTC can now be tracked here: https://bugs.chromium.org/p/angleproject/issues/detail?id=3149

Chromium support for RGTC (and BPTC) here: https://bugs.chromium.org/p/chromium/issues/detail?id=1013369

zeux commented 4 years ago

Just so that I understand, is this issue separate from the KTX2 PR (aka with the KTX2 PR we'll only get support for three-channel normals)?

Experimenting with Basis and while I like the quality of diffuse textures, normal quality is not very good. This is using normal map encoding mode but it didn't help much. I haven't tried using 2-channel normal maps with BasisU yet - not sure if the idea is to use 2 channel data with the existing ETC1S encoding, or to have two channels individually stored as ETC1S streams.

MarkCallow commented 4 years ago

Just so that I understand, is this issue separate from the KTX2 PR (aka with the KTX2 PR we'll only get support for three-channel normals)?

Which KTX2 PR are you referring to?

not sure if the idea is to use 2 channel data with the existing ETC1S encoding, or to have two channels individually stored as ETC1S streams.

A 2 component texture is supposed to be encoded with R in all 3 components of one ETC1S stream and G in all 3 components of a second. This is what toktx/libktx and basisu_tool do when the input image has just 2 components. The separateRGToRGB_A option and its equivalent in basisu_tool are only needed when the input is a 3 or 4 component image. You should specify normalMap in both cases, if the data is a normal map.

Whether splitting the components across 2 ETC1S streams in this way improves quality, I don't know but @richgel clearly thinks it does as he has implemented this feature.

zeux commented 4 years ago

Sorry - referring to #1612.

I think I get it but also am slightly confused. It sounds like there are two options:

Split two channel image into two RGB streams
Split two channel image into RGB and A stream

It seems like the "separate RG" option in basisu does the latter? Or are they the same in that alpha is also encoded as ETC1S?

zeux commented 4 years ago

Using seperate_rg_to_color_alpha option of basisu indeed produces much better quality - it needs support for DXT5 (BC3) decompression of course, and shader changes to accomodate two-channel normal maps. The screenshot here is from a modified copy of three.js where I hacked it in.

lexaknyazev commented 4 years ago

@zeux See this section of the KTX2 spec for the description of Basis Universal channels storage and mapping.

MarkCallow commented 4 years ago

@zeux, if you've looked at the reference @lexaknyazev provided you will have seen that alpha is indeed encoded as ETC1S so your 1 & 2 are the same.

Separating R & G looks a lot better. I wonder how much of this is due to separating them for compression and how much is due to using DXT5 as the transcode target. It is beyond my current knowledge.

What is the format of the input images you are using. Is it a 3 component normal map so you just omit the 3rd component? Is there no need to specially generate a 2-component normal map as the input?

zeux commented 4 years ago

@MarkCallow I haven't done the experiments to try to prove it or disprove it but I suspect that the limiting factor is ETC1S so splitting into two channels helps for this reason. This is based on the fact that I'd expect BC1 encoded normals to not look quite as bad as the screenshots above indicate (edit and also I think ETC1S is in general noticeably weaker than BC1 so I'd be surprised if BC1 itself was a limiting factor). I will try a direct BC1 encoding using something like nvtt and upload another screenshot just so that we have a data point.

And yeah, it's sufficient to simply omit the third component, so on the encoding side the only flag that's necessary is seperate_rg_to_color_alpha. This is because in tangent-space normal map storage, it's common to assume that .Z is non-negative (otherwise normals are encoded in the negative hemisphere from the vertex normal's halfspace point of view, which commonly isn't required). So .z can be reconstructed from .rg or (in this case) .ga components depending on the hardware format used for encoding. The reconstruction is simply sqrt(max(0.0, 1.0 - dot(n.xy, n.xy))). This is the same approach that is commonly used for 2-channel normal map encoding when targeting specific hardware formats like BC3 or BC5.

zeux commented 4 years ago

Here's the DXT1 encoded normal map using nvtt, with all other textures encoded using Basis. Note that the quality is very good compared to either Basis options - Basis substantially distorts the normals whereas DXT1 by itself doesn't. I will experiment with different Basis quality settings but it looks like the issue isn't with the transcode target format. update tried using Basis with -q 255 -comp_level 5 - encoding took forever (45 minutes for 12 2K textures...) but the normal quality isn't noticeably better vs the encoding with default settings I posted above.

lexaknyazev commented 4 years ago

@zeux Basis Universal internally uses "ETC1S" encoding - a subset of ETC1 compressed texture format that can be easily transcoded to DXT-style formats.

To put it simply, an ETC1S RGB block (4x4) has a local palette of 4 colors that are located on a straight line segment that is parallel to the main diagonal (0, 0, 0) -> (255, 255, 255).

The complete description of ETC1 (and ETC1S) can be found in the KDFS v1.3.

For comparison, a DXT RGB block has a local palette of 4 colors located on a straight line segment that can be oriented in any direction (within quantization limits).

When Basis Universal encoder is used with 2-component image, it puts each channel into a separate ETC1S slice. At runtime, these two slices need to be transcoded to

two ETC1 textures (ES 2.0)
EAC RG11 (ES 3.0+)
BC5 (D3D10+)
BC3, with the second source channel going into the alpha target channel (D3D9+)

For WebGL, BC4/BC5 formats are provided by the EXT_texture_compression_rgtc extension. It is already available in Firefox with ANGLE disabled (see webgl.disable-angle in about:config).

Chromium/ANGLE implementations can be tracked here: https://bugs.chromium.org/p/angleproject/issues/detail?id=3149 https://bugs.chromium.org/p/chromium/issues/detail?id=1013369

aras-p commented 4 years ago

FWIW, shader code needed in order to support both BC5 (RG components store two normal map channels) and DXT5nm (AG components store two normal map channels) as used in Unity, is somewhat cheaper than two additional dot products, in fact just one additional mul:

// Unpack normal as DXT5nm (1, y, 1, x) or BC5 (x, y, 0, 1)
fixed3 UnpackNormalmapRGorAG(fixed4 packednormal)
{
    packednormal.x *= packednormal.w;

    // reconstruct Z
    fixed3 normal;
    normal.xy = packednormal.xy * 2 - 1;
    normal.z = sqrt(1 - saturate(dot(normal.xy, normal.xy)));
    return normal;
}

This does require the DXT5nm style encoder to put 1.0 into the "unused" red & blue channels.

zeux commented 4 years ago

@aras-p Thanks, this is a neat trick I forgot I knew. Just to clarify, it’s only essential that red channel contains 1 - blue could be an arbitrary value including 0?

zeux commented 4 years ago

@lexaknyazev Thanks, I wasn’t aware that ETC1S excludes support for deltas. This is unfortunate, as it makes it even weaker than ETC1 (I understand that this is a compromise to find a common ground between ETC1 and DXT1) which is already not as good as DXT1, and is unlikely to be a good fit for three component normals. I am wondering if two component normals stored in a single ETC1S slice would fare better, will test this later.

update no, this doesn’t seem to help. I guess the only solution is to use two separate ETC1S slices.

With ETC1S used to encode x/y separately I wouldn’t expect a dramatic difference in quality between BC2 and BC5 - ETC1S should be the quality bottleneck - so it sounds like the main issue is specifying this such that the renderers can use two channel normals, including a possible variance in supported swizzles, and this will work as well as possible within constraints of BasisU even without browser support for BC5.

aras-p commented 4 years ago

Just to clarify, it’s only essential that red channel contains 1

Yeah.

lexaknyazev commented 4 years ago

With ETC1S used to encode x/y separately I wouldn’t expect a dramatic difference in quality between BC2 and BC5

BC2 stores alpha explicitly, quantized to 4 bpp. Did you mean BC3?

BC3 alpha block is exactly the same as one channel of BC4/5.
Single-channel ETC1S transcoded to BC3 RGB should be slightly worse than transcoded to BC4/5.

zeux commented 4 years ago

Yes - I meant BC3.

donmccurdy commented 4 years ago

At minimum, then, normalTexture will require an alternative channel layout to support KHR_texture_basisu. metallicRoughnessTexture may as well (some research needed). Starting with normal maps — how might we specify the new layout?

A. KHR_texture_basisu spec could explicitly say something like "when used for the normalTexture of a material, channel packing must be RRRG."
B. Introduce new extension(s) for the purpose.

For a new extension, we'd want to consider where to attach it. Basis textures are included, with fallback to PNG or JPG, like this:

{
  "materials": [{
    "normalTexture": {
      "scale": 2,
      "index": 3,
      "texCoord": 1
    },
  }],
  "textures": [{
    "source": 0, 
    "extensions": { "KHR_texture_basisu": { "source": 1 } }
  }],
  "images": [
    { "uri": "base-color.png" },
    { "uri": "base-color.ktx2" }
  ]
}

Extending the material doesn't make much sense: the alternate packing doesn't apply if the fallback PNG normal map is used. Do we extend the image, instead? Or the KHR_texture_basisu extension itself?

/cc @lexaknyazev @bghgary

bghgary commented 4 years ago

I think KHR_texture_basisu must depend on this packing extension if we introduce a new extension, so extending KHR_texture_basisu itself seems odd.

I'm thinking maybe we extend the normalTextureInfo?

{
  "materials": [{
    "normalTexture": {
      "scale": 2,
      "index": 3,
      "texCoord": 1,
      "extensions": {
        "KHR_texture_packed_normal": {
          "index": 4
        }
      }
    },
  }]
}

EDIT: hmm, this might not work since KHR_texture_basisu is on texture.

lexaknyazev commented 4 years ago

So far in the core spec, texture's usage-specific properties (like color transfer function) are defined by the "referrer". Following this pattern, the packing should be defined by extending normalTextureInfo. However, this makes fallback too convoluted (with two texture references).

On the other hand, the only runtime difference is that an application will have to restore the third component in shaders. As shown above, saying that x = r * a works with all possible inputs, so engines supporting BasisU may as well ignore supplied blue channel altogether.

donmccurdy commented 4 years ago

On the other hand, the only runtime difference is that an application will have to restore the third component in shaders. As shown above, saying that x = r * a works with all possible inputs...

The two channels of an "RG" normal map are not actually in the RG channels of transcoded output, though... they might be two separate ETC1 textures, or a single RGBA texture packed as RRRG. Engines will need new shader permutations to support either.

lexaknyazev commented 4 years ago

Well... This is wrong. For that shader to work we need either (1, y, _, x) or (x, y, _, 1) sampling.

Basis transcoder produces different output. Original rgba becomes rgba (BC3) or ra01 (BC5). To get the same quality from ETC1S for both (X and Y) channels, the first slice must be stored as grayscale (before encoding, the image must be swizzled to rrrg).

This is quite troublesome because it's hard to disambiguate between xxxy and xy01.

lexaknyazev commented 4 years ago

Putting the case of two separate ETC1 textures aside and assuming RGTC and WebGL 2.0 support (they are very close), it's not that many permutations:

norm.xyz = tex.rgb * 2.0 - 1.0; // default PNG
#if (RG01)
restore_z(norm);
#endif

This snippet covers BC5 / EAC RG11 runtime usage and I think we should aim at it. Engines are free to support legacy schemes like 2 * ETC1 and DXT5nm, but we should not design for them.

lexaknyazev commented 4 years ago

On the other hand, we could simply add a new transcoding flag to Basis that would enforce r10a swizzling for BC3 output. This would allow engines to use n.y = tex.g * tex.a for both BC3 and BC5. They could also always ignore blue channel (even from PNG) and use the same shader code for all input options.

The latter means that we probably do not need any new extension for normal maps. Their encoding still has to be described in KHR_texture_basisu though.

zeux commented 4 years ago

For cases when no compressed texture formats with alpha are supported, I think the best tradeoff in terms of quality / effort for the old hardware is to use downsampled RGBA decompression. Transcoding from block-compressed textures to a 2x2 subsampled image results in the same memory footprint at a reduced quality. My impression was that this is a planned feature for BasisU transcoder, although I'm not sure what the timeline is. /cc @richgel999

edit for textures with a full mip chain it's sufficient to have transcoder be able to decompress the RGBA image, and then the application can skip the top mip level to get the memory reduction.

lexaknyazev commented 4 years ago

RGTC (BC4 and BC5) support has landed in ANGLE.

Follow https://bugs.chromium.org/p/chromium/issues/detail?id=1013369 to see when it comes to Chromium.

donmccurdy commented 4 years ago

On the other hand, we could simply add a new transcoding flag to Basis that would enforce r10a swizzling for BC3 output.

I like this suggestion — anything we can delegate to software will save client implementations a lot of trouble, as long as the swizzle limitations remain a problem in WebGL. Some questions:

I understand that this transcoding flag would not work when transcoding to ETC1. What about ASTC or PVRTC?
Can we do the same for metallicRoughnessTexture?

I don't understand why the swizzle is r10a and not ra01, but will assume that's because I don't know much about BC3. 😅

lexaknyazev commented 4 years ago

What about ASTC or PVRTC?

There's no gain in using ASTC for this purpose because the source data is ETC1S anyway and all mobile devices that support ASTC support ETC2 as well.

As far as I remember, PVRTC doesn't have anything like two-plane mode, so it's not very useful for normal maps.

donmccurdy commented 4 years ago

As far as I remember, PVRTC doesn't have anything like two-plane mode, so it's not very useful for normal maps.

Hm... are you saying iOS will need to transcode normal maps to decompressed RGBA or use fallback PNG/JPG data?

zeux commented 4 years ago

I'm curious about whether this actually needs transcoder changes, or if channel splitting is sufficient if the first channel is pre-encoded with 1 in relevant components.

That is:

For normal maps, we'd like the transcoder output to be a vector (1, Y, ?, X). Then the shader can multiply .R by .A after sampling the texture to get the .X component, and restore the .Z component. This is compatible with both (1, Y, ?, X) encoding and (X, Y, Z, 1) encoding in vanilla normal maps (assuming they don't have an alpha channel - safe assumption?)
For metal-roughness maps, we'd like the transcoder output to be a vector (?, R, 1, M). Then the shader can multiply .B by .A after sampling the texture to get the metalness component - this is compatible with both (?, R, 1, M) and (?, R, M, ?) encodings.

One approach to get here is to teach transcoder to transcode into these formats given two single-channel ETC1S inputs. Another approach is to try to use ETC1S input that already has the data encoded properly.

My understanding is that if we don't care about one of the components, and the third component is 1, then ETC1S restriction for the deltas being shared across all 3 channels is actually fine? If this is true, we can try to implement this swizzling purely on the encoder side, which is much easier since this is format-independent.

An additional benefit is that for occlusion-metal-roughness maps, we can just store occlusion into the red channel. We will get quality issues because of interference between occlusion and roughness, but maybe it's less bad than having interference between all three channels? :)

The above assumes that the target encoding format has support for decorellated RGB & A data. So BC3 & ETC2 & ASTC - yay. BC5 - nay.

donmccurdy commented 4 years ago

There's no gain in using ASTC for this purpose because the source data is ETC1S anyway and all mobile devices that support ASTC support ETC2 as well.

With apologies for the beginner questions on compressed textures, does this mean ETC2 is always preferable to ASTC? And if so, why? Comparing WebGL stats...

... only ASTC comes without a scary flashing warning. 😞

zeux commented 4 years ago

Testing the idea sketched in the previous comment.

This is FlightHelmet:

Using normal maps encoded without channel swizzling and transcoded to BC1, we get this:

Using -separate_rg_to_color_alpha switch produces the texture that, when decoded to BC3, has XXXY format. Decoding requires reconstruction of Z in the shader, and the shader has to be aware of the format:

Using a custom built basisu encoder that produces (255, Y, 255, X) swizzle and then encodes two-slice image; shader uses branchless reconstruction (it's safe to always reconstruct Z):

Looks like simply swizzling isn't quite close enough, presumably some color metrics need to be adjusted to get the correct endpoints. Also all formats aren't great on this asset, so maybe I need to use a different asset for testing this...

edit All of this is using default quality settings (128/255). I can test some other asset at max quality to get a better feel for this.

lexaknyazev commented 4 years ago

My understanding is that if we don't care about one of the components, and the third component is 1, then ETC1S restriction for the deltas being shared across all 3 channels is actually fine?

ETC1S stores one RGB value for 4x4 block and a number that determines how all three channels are shifted to form a local palette. This means that it would be very challenging if not impossible to keep one of RGB channels to be always 1 (basically the encoder would need to restrict pixels to two selectors instead of four) while maintaining expected quality.

does this mean ETC2 is always preferable to ASTC

ASTC is better than ETC2 in general, but Basis Universal just doesn't have enough data to leverage those capabilities. Basis Universal is a strict subset of ETC1 which is a subset of ETC2.

only ASTC comes without a scary flashing warning

That warning is based on very old extension drafts and does not apply to implementations.

zeux commented 4 years ago

basically the encoder would need to restrict pixels to two selectors instead of four

Ah, I didn’t realize the deltas in the tables are both positive and negative :( It’s interesting that the results on one example above are pretty close to the image that’s using pure channel split. Maybe this isn’t very representative.

lexaknyazev commented 4 years ago

are you saying iOS will need to transcode normal maps to decompressed RGBA

We can reasonably expect that iOS devices that receive software updates will get ETC2 support in WebGL. For now, we have to use decompressed normal maps or two PVRTC textures (just like with old Android devices that can use only two ETC1 textures).

aras-p commented 4 years ago

What about ASTC or PVRTC?

PVRTC becomes less and less relevant with each passing day. iOS devices that can only do PVRTC (and not ETC2 as well) are pre-A7 chips, which means devices older than year 2013. Last iOS version to run on these devices was iOS 10.

zeux commented 4 years ago

It's worth noting that there's a similar mismatch between what the hardware can do and what the software can use on iOS. On my iPhone X, WebGL 2 is not available in Safari. In WebGL 1, the only extension for compressed textures is PVRTC.

Enabling WebGL 2 experimental feature allows me to use WebGL 2 context, but it still only has PVRTC extension available.

aras-p commented 4 years ago

It's worth noting that there's a similar mismatch between what the hardware can do and what the software can use on iOS

Yeah, that is unfortunate :(

lexaknyazev commented 4 years ago

@zeux @aras-p As soon as WebKit switches to ANGLE, those issues should be resolved. See https://bugs.webkit.org/show_bug.cgi?id=198948.

KhronosGroup / glTF

KHR_materials_basisu and independent data channels #1682