KhronosGroup / Vulkan-Docs

The Vulkan API Specification and related tools
Other
2.74k stars 461 forks source link

FS input `location`s with `flat` differing between `component`s incompatible with modern GPUs? #2170

Open Triang3l opened 1 year ago

Triang3l commented 1 year ago

SPIR-V and GLSL make it possible to declare multiple fragment shader input variables within the same location using the component layout qualifier.

The GLSL specification defines the following requirements for variables assigned to components of the same location:

Location aliasing is causing two variables or block members to have the same location number.

[…] when location aliasing, the aliases sharing the location must have the same underlying numerical type and bit width (floating-point or integer, 32-bit versus 64-bit, etc.) and the same auxiliary storage and interpolation qualification.

(Section 4.4.1. Input Layout Qualifiers of the GLSL 4.60 specification.

However, I am unable to find any similar limitations in the Vulkan (in both 1.0 and 1.3-extensions) and SPIR-V specifications — in "Interpolation decorations", "Location Assignment", "Component Assignment", there doesn't seem to be anything that prevents variables with aliasing Location decorations from having different interpolation decorations. glslang also seems to generate SPIR-V fine if such case occurs.

This is, however, a problematic decision/oversight if it turns out to be true. If I understand correctly, some hardware, including desktop GPUs produced today, requires flat shading to be enabled for whole 4-component fragment input vectors (each corresponding to a Location in Vulkan basically) — so you can't mix Flat and non-Flat, and thus also floating-point and integer variables within the same Location. Specifically, there are at least two implementations where this seems to be true:

With Vulkan's original design built around monolithic pipelines, it may be possible that it was an intentional decision to relax those requirements, as it might have been expected that this would be resolved during VS–FS linkage (note that interpolation decorations only need to be provided in FS, they have no effect in the vertex stages), since with monolithic pipelines, VS/TES/GS and FS are aware of each others' interfaces, and may do remapping if needed.

However, the direction of the design has changed towards separate compilation of stages and fast linkage since then. The graphics pipeline library extension contains the device property graphicsPipelineLibraryIndependentInterpolationDecoration that requires the application to specify the needed interpolation decorations not only in the fragment shaders, but in the last vertex stage too where it must match, if it's VK_FALSE. It may be helpful in this situation, or it may not, I'm not sure. But the biggest user of graphics pipeline libraries — DXVK — requires that property to be true, as in Direct3D shader bytecode, interpolation modifiers are specified only in the pixel shader (though you can't mix interpolation modifiers within one vec4 in Direct3D shader bytecode either, and in the HLSL source, you have to specify the interpolation modifiers in both VS and PS so the compiler doesn't compact variables with different interpolation modifiers into one vec4 — but this info is not written to the VS bytecode, that only effects location assignment). And the more modern VK_EXT_shader_object doesn't have any equivalents of that while letting applications freely mix different vertex and fragment shaders even without creating pipelines.

Doing any remapping on the GPU at runtime using something like creating subroutines in hardware shader machine code for remapping so that all smooth and all flat components are in different vectors (both in the end of the VS and in the beginning of the FS) doesn't seem to be a viable approach to me, at least for two reasons:

But even if you let some kind of linkage resolve this situation and remap all smooth and all flat varyings to different 4-component vectors, that still won't cover all of the cases. Specifically, if you have maxFragmentInputComponents and its vertex counterpart set to 128, if you declare 125 smooth components and 3 flat ones, even if you compact them you'll end up with 1 vector containing both smooth and flat variables — something not possible on the hardware. For that, you'd have to reduce maxFragmentInputComponents so you always have one free vec4 in hardware for this purpose — but this would make the Vulkan limits here inferior to the Direct3D 11 ones on existing modern hardware, and that would harm DXVK and VKD3D.

Was this relaxing in Vulkan compared to OpenGL intentional, and would it be possible to maybe retroactively modify the specification to reintroduce that limitation from OpenGL as there apparently are existing popular drivers where mixing of interpolation decorations within a location produces an incorrect result, and that's basically not fully fixable on many GPUs still actively used?

gfxstrand commented 1 year ago

In answer to the question you asked in the subject line, the answer is yes, it's incompatible with modern GPUs. Intel, for instance, has a 32-bit field in 3DSTATE_SBE called "Constant Interpolation Enable" which is one bit per-location, not per-component. There is no way on Intel hardware to make one component flat and another component interpolated on the same location. Other hardware may have similar restrictions but I'm less familiar with those. If you want this behavior, you can get it via VK_KHR_fragment_shader_barycentric and managing provoking vertex yourself.

As for the spec itself, there may be a bug where we missed the bit in the GLSL spec when we tried to translate all that to SPIR-V. The intention in Vulkan was never to lift this restriction.