Open Rua opened 1 year ago
The main issue is a driver is fully legal to do what ever it wants with a shader at vkCreateShaderModule
time, regardless of which entrypoints you may use later in the pipeline. There have been real driver crashes from not having some global limits validated at vkCreateShaderModule
time
My goal was to split the logic into "things at vkCreateShaderModule
" vs "things at pipeline time" (as we do everything at pipeline time currently)... this got tricky with VK_EXT_shader_object
and GPU-AV needing to instrument shaders
As for your maxMeshSharedMemorySize
/maxTaskSharedMemorySize
example, it says
is the maximum total storage size, in bytes, available for variables declared with the Workgroup storage class in shader modules in the mesh shader stage.
so this says nothing about the entrypoint
of the mesh shader, just that the VkShaderModule
for VK_SHADER_STAGE_MESH_BIT_EXT
has this limit, so from that, this is behaving as expected.
sometimes generates false positives.
I disagree this is a false positive, and not really willing to change this unless there are proper CTS test (and spec clarification) for making sure all drivers would correctly parse the entrypoint and do full static analysis to know how much workgroup size is being used
created https://gitlab.khronos.org/vulkan/vulkan/-/issues/3696 to discuss in WG
There have been real driver crashes from not having some global limits validated at vkCreateShaderModule time
Was this ever confirmed? vkCreateShaderModule on RADV is nothing more than a memcpy
(plus hashing) of the SPIR-V, and I'm pretty sure that was always the case. vkCreateShaderModule
crashing the GPU sounds highly unlikely to me.
Was this ever confirmed?
I know I saw it crash and then it was updated and doesn't crash now (I guess could grab an old commit and see). But I also was surprised it crashed in vkCreateShaderModule
... if this is something we can confirm is not a VVL issue, happy to just close that issue then
created https://gitlab.khronos.org/vulkan/vulkan/-/issues/3696 to discuss in WG
so we said we thinks this is indeed to be intended for per-Entrypoint and really this is just an ambiguity in the spec. I am tasked with making the spec change, confirming with everyone it is what we wanted and also get a CTS test for it.
I will try to find time soon(ish) to write a test and update this to be at an entrypoint level
The current code that validates against VUID-RuntimeSpirv-Workgroup-06530, VUID-RuntimeSpirv-maxMeshSharedMemorySize-08754 and VUID-RuntimeSpirv-maxTaskSharedMemorySize-08759 is too broad and sometimes generates false positives. The
CalculateWorkgroupSharedMemory
function uses a list of all global variable instructions to calculate the memory size, but it doesn't take the selected entry point and its interface into account.For example, a shader module could contain both a task and a mesh shader entry point. This function will sum the sizes of the variables used for both entry points, and then compare that grand total to the
maxMeshSharedMemorySize
ormaxTaskSharedMemorySize
limit. It should really only be counting the variables that are actually used by the selected entry point (for a SPIR-V 1.4+ shader, variables that appear in the entry point's interface).This is a problem that exists more generally with the RuntimeSpirv VUIDs, and I opened https://github.com/KhronosGroup/Vulkan-Docs/issues/2272 to bring attention to it. In the validation layer, the problem exists with at least these VUIDs but there's probably more.