Open Triang3l opened 1 year ago
In answer to the question you asked in the subject line, the answer is yes, it's incompatible with modern GPUs. Intel, for instance, has a 32-bit field in 3DSTATE_SBE
called "Constant Interpolation Enable" which is one bit per-location, not per-component. There is no way on Intel hardware to make one component flat and another component interpolated on the same location. Other hardware may have similar restrictions but I'm less familiar with those. If you want this behavior, you can get it via VK_KHR_fragment_shader_barycentric and managing provoking vertex yourself.
As for the spec itself, there may be a bug where we missed the bit in the GLSL spec when we tried to translate all that to SPIR-V. The intention in Vulkan was never to lift this restriction.
SPIR-V and GLSL make it possible to declare multiple fragment shader input variables within the same
location
using thecomponent
layout qualifier.The GLSL specification defines the following requirements for variables assigned to
component
s of the samelocation
:(Section 4.4.1. Input Layout Qualifiers of the GLSL 4.60 specification.
However, I am unable to find any similar limitations in the Vulkan (in both 1.0 and 1.3-extensions) and SPIR-V specifications — in "Interpolation decorations", "Location Assignment", "Component Assignment", there doesn't seem to be anything that prevents variables with aliasing
Location
decorations from having different interpolation decorations.glslang
also seems to generate SPIR-V fine if such case occurs.This is, however, a problematic decision/oversight if it turns out to be true. If I understand correctly, some hardware, including desktop GPUs produced today, requires flat shading to be enabled for whole 4-component fragment input vectors (each corresponding to a
Location
in Vulkan basically) — so you can't mixFlat
and non-Flat
, and thus also floating-point and integer variables within the sameLocation
. Specifically, there are at least two implementations where this seems to be true:FLAT_SHADE
field of theSPI_PS_INPUT_CNTL_[0-31]
registers. Starting with RDNA 2 it may be possible to emulate per-component flat shading in software sinceVK_KHR_fragment_shader_barycentric
is supported (as RDNA 2 is able to make the shader aware of which vertex is the provoking one), however, on earlier AMD hardware, withoutFLAT_SHADE
, the per-vertex values come to the shader in an undefined order, thus it's not possible to load the one for the provoking vertex.ConstantInterpolationEnable
field of the3DSTATE_SBE
structure is a 32-bit mask, and just like on AMD, each bit also seems to control flat shading for the entire 4-component vector as opposed to individual scalars.With Vulkan's original design built around monolithic pipelines, it may be possible that it was an intentional decision to relax those requirements, as it might have been expected that this would be resolved during VS–FS linkage (note that interpolation decorations only need to be provided in FS, they have no effect in the vertex stages), since with monolithic pipelines, VS/TES/GS and FS are aware of each others' interfaces, and may do remapping if needed.
However, the direction of the design has changed towards separate compilation of stages and fast linkage since then. The graphics pipeline library extension contains the device property
graphicsPipelineLibraryIndependentInterpolationDecoration
that requires the application to specify the needed interpolation decorations not only in the fragment shaders, but in the last vertex stage too where it must match, if it'sVK_FALSE
. It may be helpful in this situation, or it may not, I'm not sure. But the biggest user of graphics pipeline libraries — DXVK — requires that property to be true, as in Direct3D shader bytecode, interpolation modifiers are specified only in the pixel shader (though you can't mix interpolation modifiers within one vec4 in Direct3D shader bytecode either, and in the HLSL source, you have to specify the interpolation modifiers in both VS and PS so the compiler doesn't compact variables with different interpolation modifiers into one vec4 — but this info is not written to the VS bytecode, that only effects location assignment). And the more modernVK_EXT_shader_object
doesn't have any equivalents of that while letting applications freely mix different vertex and fragment shaders even without creating pipelines.Doing any remapping on the GPU at runtime using something like creating subroutines in hardware shader machine code for remapping so that all smooth and all flat components are in different vectors (both in the end of the VS and in the beginning of the FS) doesn't seem to be a viable approach to me, at least for two reasons:
VK_EXT_shader_object
translates to hardware concepts on TeraScale even more nicely and transparently than on the more modern GCN and RDNA.flat
outputs), you'd basically either have to have one subroutine, but then all outputs will have to be in precious general-purpose registers at the moment of the call (thus you won't be able to export them early), or you'd have to make lots of per-location subroutine calls, and the same will apply to the FS, but with the result general-purpose registers.But even if you let some kind of linkage resolve this situation and remap all smooth and all flat varyings to different 4-component vectors, that still won't cover all of the cases. Specifically, if you have
maxFragmentInputComponents
and its vertex counterpart set to 128, if you declare 125 smooth components and 3 flat ones, even if you compact them you'll end up with 1 vector containing both smooth and flat variables — something not possible on the hardware. For that, you'd have to reducemaxFragmentInputComponents
so you always have one free vec4 in hardware for this purpose — but this would make the Vulkan limits here inferior to the Direct3D 11 ones on existing modern hardware, and that would harm DXVK and VKD3D.Was this relaxing in Vulkan compared to OpenGL intentional, and would it be possible to maybe retroactively modify the specification to reintroduce that limitation from OpenGL as there apparently are existing popular drivers where mixing of interpolation decorations within a location produces an incorrect result, and that's basically not fully fixable on many GPUs still actively used?