KHR_mesh_quantization: compatibility with native graphics APIs

zeux commented 4 years ago

While the extension is still in draft, it would be good to discuss one thing that I thought of after the extension was drafted and have mixed feelings about: namely, the KHR_mesh_quantization, and how it behaves with respect to the rules different graphics APIs use for float <-> int conversion in the input assembler stage.

KHR_mesh_quantization is targeting WebGL; in WebGL as well as OpenGL (desktop and ES), the rules for how vertex attributes get converted to shader inputs are lax: anything can be converted to a floatN type. So the implementation of the extension is really straightforward - just tell GL what the format of the vertex stream is.

Metal rules are a tiny bit stricter but still, any attribute can be converted to a floating point one (https://developer.apple.com/documentation/metal/mtlvertexattributedescriptor/1516081-format?language=objc).

However, unfortunately, Direct3D11 and Vulkan specify that the "base type" of the shader declaration must match the base type of the vertex attribute. This means that if the vertex attribute has a type int4, shader can't read it via a float4 input.

I'm not sure what WebGPU will do exactly since the current specification doesn't go into detail on this.

This restriction is unfortunate - in principle, the difference between normalizing an int4 to a float4 and converting it to float4 without normalizing is negligible - but it's what it is.

Consequently, when loading a glTF model with KHR_mesh_quantization for the use with the APIs that don't support transparent type conversion, the decoder currently has a few possible options:

Keep the data in the format it is encoded in glTF file, and synthesize the shader dynamically from the vertex input signature. This is what a WebGPU renderer is likely to do from what I understand, since that's the practice employed by all existing WebGL 3D frameworks, but it's generally frowned upon outside of the Web since shader compilation is a lengthy and fragile process.
Decode the integer data into a floating point array during loading. This maintains the transmission size benefits that quantized data provides, but negates the in-memory size benefits.
Keep the data in a normalized integer array, and add a scaling factor in the shader to correct for the division that happens in hardware. E.g. when a SHORT data stream is used, the renderer would use normalized SHORT stream, and multiply the value by 32767 (which would come from a uniform in the shader, so could be set dynamically based on the data type). The multiplication can introduce 0.5ulp of error but that's likely to be negligible compared to the quantization error.

This issue only affects POSITION and TEXCOORD as NORMAL and TANGENT are specified to only support normalized values.

For Web renderers it's actually more straightforward to handle non-normalized data because in JavaScript you can transparently refer to the elements of a packed array (e.g. Int8Array) without changing the rest of the code.

One option to address this would be to amend the extension specification to only allow normalized formats. This would force the encoders to pack the data into a normalized range instead of an integer range, and use dequantization matrix to unpack.

Another option would be to keep the extended format flexible - since there are options for native renderers, it might be okay.

Thoughts welcome.

rsahlin commented 4 years ago

I have a comment that I think is related to what you are discussing here:

I would like to add support for 8 and 16 floats as defined by Vulkan / OpenGLES APIS. The reason for this is to be able to avoid conversion/normalisation altogether. Also, at least on mobile devices, there is a huge performance gain in shaders when half float values are used. Modern (mobile) GPUs typically do twice the number of half float computations. For this reason I would like to add formats: 16 SFLOAT 8 SFLOAT into the mesh quantization extension - meaning that if an attribute is either of these two it's possible to use directly and with greater performance.

Is this at all related to your question @zeux ?

Vulkan format spec: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkFormat.html GLES 3.0 Attrib pointer spec: https://www.khronos.org/registry/OpenGL-Refpages/es3.0/html/glVertexAttribPointer.xhtml

zeux commented 4 years ago

This is unrelated to my question, my question is "should we reduce the set of formats to make it easier for native renderers to deal with format mismatch".

We discussed alternate formats before; the half precision formats aren't supported on WebGL 1 so it's going to be hard to support for existing renderers.

Additionally, half-precision provides suboptimal precision for vertex component data (the normalized integers used here are generally superior), and I don't think there's an actual performance gain in this case by using halfs over normalized integers. You can declare shader inputs as mediump to make math faster regardless of the storage format.

I'm not sure what you're referring to when you say "8 SFLOAT"? This format doesn't exist.

rsahlin commented 4 years ago

Hi @zeux and thanks for your reply!

Granted the half float performance gain can be achieved without storing the attributes as 16 bits. However, theres a memory gain that is quite significant when using lesser precision - this is lost in this extension.

I am not very fond of skipping features just because there is no support in WebGL. I would hope that the future of glTF can be more platform/implementation agnostic than this. Another reason to include the formats as they can be read by, current and future, graphics hardware is to avoid the overhead of runtime conversion. I also think this is key when it comes to getting traction for glTF outside of the WebGL world.

Support for the formats could be written in such a way that if not supported on platform they would be converted to the most fitting one for the implementation - I don't see how this would be an issue.

The 8 SFLOAT is a typo on my side and should be 8 SNORM - if that is not already supported.

zeux commented 4 years ago

However, theres a memory gain that is quite significant when using lesser precision - this is lost in this extension.

Sorry, I'm not sure I understand what you mean. This extension proposes using quantized types that result in memory gain, including 8 SNORM.

I am not very fond of skipping features just because there is no support in WebGL.

The specification has to consider the practicality of implementation. If there are any extra formats that might be useful, they have to be part of a separate extension. If you want to propose it, feel free to do so, but KHR_mesh_quantization was designed around the feature set that is possible to support on existing implementations WITHOUT format conversion performed during load.

rsahlin commented 4 years ago

Sorry, I'm not sure I understand what you mean. This extension proposes using quantized types that result in memory gain, including 8 SNORM.

I'm refering to option 2 in your post - I don't see that option 1 is realistic and I'm not that fond of option 3 either.

The specification has to consider the practicality of implementation. If there are any extra formats that might be useful, they have to be part of a separate extension. If you want to propose it, feel free to do

Sorry you feel this way! What implementations are you refering to? Unreal? Unity? JMonkeyEngine? I still think there is a risk of leaning to heavily on webGL as implementation standard - I really feel glTF should be more agnostic and using commong hardware capabilities. Afterall the Khronos groups tagline is 'Connecting software to silicon'

zeux commented 4 years ago

I'm refering to option 2 in your post - I don't see that option 1 is realistic and I'm not that fond of option 3 either.

I see - this is why I raised this topic. If we decide to remove integer formats from the extension, the encoders would be able to use normalized formats. But this doesn't seem related to half-precision support.

I still think there is a risk of leaning to heavily on webGL as implementation standard - I really feel glTF should be more agnostic and using commong hardware capabilities.

What you are proposing is not a common hardware capability, because WebGL 1 lacks it. (edit: and also GLES2 of course, WebGL 1 is based on GLES2)

zeux commented 4 years ago

Just to be clear, it's possible for us to introduce KHR_mesh_quantization2 or some such in the future that relies on support for additional formats. Supporting this extension will be harder because the decoders will have to decode the vertex streams in software (on load) for older APIs.

In addition to half-precision floats (that I would consider for completeness sake more so than for performance/transmission size/etc. sake, because integers are generally superior), we discussed adding 10_10_10_2 formats for higher precision normal/tangent storage.

However, this is outside of the scope of KHR_mesh_quantization because it requires hardware capabilities that are not universally supported by platforms that glTF needs to run on. So I would appreciate separating that discussion from this issue.

lexaknyazev commented 4 years ago

@rsahlin All vertex formats that could be reasonably used for mesh data are already supported:

float32
- glTF 2.0 core
uint8/sint8/unorm8/snorm8/uint16/sint16/unorm16/snorm16
- provided by KHR_mesh_quantization
float16/uint32/int32
- little to no gain, also require more capable platforms
unorm_10_10_10_2/snorm_10_10_10_2/uint_10_10_10_2/sint_10_10_10_2
- could be considered for a new extension although support is inconsistent

@zeux I'd keep the extension as it is. Depending on the platform, at least one of three options will work.

rsahlin commented 4 years ago

Hi @zeux

What you are proposing is not a common hardware capability, because WebGL 1 lacks it. (edit: and also GLES2 of course, WebGL 1 is based on GLES2)

I disagree. It's a common hardware capability but lacking in WebGL (based on GLES2) which is really old - eg it's an implementation problem - not a hardware one.

I don't like the glTF standard being held back because of an implementation using an outdated graphics API. I think the standard should be platform/implementation agnostic - aim for interaction with current Khronos API such as Vulkan and KTX2. This is imperative to keep glTF competetive.

With that said: My main concern is the 16 half float format - I think it would be very beneficial to have it in this extension. Plus the range of formats (signed unsigned and normalized) somewhat imposes an implementation effort. I think it would be better with less but distinct formats with hardware (not implementation) support. Eg half float, signed 16 bit, signed 8 bit and perhaps 8 bit normalized.

zeux commented 4 years ago

One additional data point in favor of keeping integer inputs as an option I forgot to mention is relevant for morph targets.

For morph targets, a very common case is that the range of the position value is much higher than the range of the position delta. In many cases position can be encoded using 16-bit integers, and deltas can be encoded using 8-bit integers.

However, for this to work it's critical that the data is unnormalized since both the base values and the deltas are added together. Removing support for unnormalized inputs would effectively block this optimization and force the encoders to use 16-bit deltas for morph positions. Of course morph targets aren't as common of a use case, but still an important one I think.

Also, given that there's a general trend in high end renderers to avoid fixed function vertex fetch in general, it's worth mentioning that the problem isn't as relevant for renderers that adopt programmable vertex pulling (vertex shader fetching data from buffers using vertex id as an index) - Vulkan supports uniform texel buffer inputs with arbitrary format conversions. Of course there are stride restrictions which likely necessitate using non-interleaved accessors - although this is common in glTF ecosystem as it stands, there are very few models with interleaved storage around. So one extra possibility for native renderers is to use programmable vertex pulling, which is compatible with any format choice.

Summarizing:

Reasons to keep unnormalized integer formats in the extension:

Unnormalized formats do not add extra implementation work for GL/GLES/WebGL/Metal
Efficient morph target delta encoding requires unnormalized integers
It's a bit easier to handle unnormalized integers in JS since the conversions are more straightforward
Programmable vertex pulling makes de-interleaved streams easy to support regardless of the format used.

Reasons to remove unnormalized integer formats from the extension:

For Vulkan/Direct3D the loaders will need to use workarounds; these may apply to WebGPU as well
Most workarounds aren't particularly straightforward or nice
Before this extension, the base data type (int/float) was statically known based on the attribute type, so this sets a new precedent.

Additionally, due to the flexibility of glTF native loaders already likely need to jump through many hoops to support the full combination of options today. For example, handling texture coordinate sets correctly, or handling absence/presence of normal/tangent/color data. When shaders aren't being dynamically generated, there's bound to be some sort of load-time adjustment to handle all of this with compromises.

There's lots of variables here and I am leaning towards keeping the extension spec as is - either way it's a compromise and it's not clear to me that the current side of the compromise is worse than the other side.

KhronosGroup / glTF

KHR_mesh_quantization: compatibility with native graphics APIs #1739