Closed Firestar99 closed 1 month ago
I don't think a non-struct payload is intended to work, especially since it is always a struct in HLSL/D3D12 where the feature originated as well as that there is VU requiring only one variable per entry point so a non-struct is a somewhat degenerate case. I'll see about getting clarification. That language you quoted is in the proposal document not VUs, and I expect it's intended just to say that payloads are not required.
It looks like your radv version is very old, 23.2.1, have you tried testing on an up to date driver?
Updated my radv version to 24.0.7 (device), but still the very same error: capture_struct_payload capture_uint_payload
kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring comp_1.1.0 timeout, signaled seq=30249, emitted seq=30250
kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process qrenderdoc pid 26623 thread ReplayManager pid 26660
Seems like the comp rings can differ, previously I've only seen comp_1.2.0
but now I'm seeing comp_1.1.0
and comp_1.0.1
, don't know if that's any significant though.
Anyhow, I also completed the table with more testing results on Windows. Notably encountered a RenderDoc crash on AMD with payload structs when selecting the mesh shader draw cmd iirc, see capture and dump in table.
The struct capture works fine for me on radv 24.0.7:
I'm running quite a different GPU to you so this might be something GPU-specific, but I don't see any validation warnings on the mesh output fetch and since it works for me I think you would need to report this to mesa. As far as I'm aware RenderDoc's mesh shader support is working OK for other people on mesa so this may be a device-specific bug. The mesa folks will be better placed to diagnose the problem and report back to me if it's a RenderDoc bug after all.
I get a similarly successful result on amdvlk, so it may be something in common between the two.
The windows AMD bug looks like a crash I have previously reported to them, though it may be different so it may be worth reporting to them yourself.
Mesa bug report: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11156
When debugging with RADV_DEBUG=hang
it interestingly states it's this pipeline, so most likely not even a RenderDoc bug [...]
For further investigation I got myself a clean system and managed to reproduce the bug there as well. However, some special conditions seem to be required for it to trigger. Could you please try to reproduce it again with these new repo instructions?
/etc/vulkan/implicit_layer.d/amd_icd64.json
to remove the VK_LAYER_AMD_switchable_graphics_64
implicit layer, which forces you to always use the amdvlk drivervulkanCapsViewer
can see both drivers, RADV with AMD Radeon Graphics (RADV REMBRANDT)
and amdvlk with AMD Radeon Graphics
(what a stupid naming)My current conclusion is that an amdvlk device being available, even though it is unused, is enough to cause the Renderdoc to freeze. I wanted to run that issue by you, in case it's something to do with RenderDoc's device selection, before going back to asking the RADV team.
Here's a log of "Open Capture with Options" with the RADV device explicitly selected and API validation turned on: RenderDoc_2024.05.16_16.07.32.log
There's no way I can see for just having a driver installed to cause a crash because of a RenderDoc bug. RenderDoc by default selects the closest matching physical device by hardware and driver, overriding it is not recommended but in either event the presence of other drivers won't cause a problem there either. Unless you can specifically find evidence of a RenderDoc bug I don't think it seems possible.
I see the mesa issue has been closed so I will close this one as well.
Description
Using the payload of vulkan mesh shaders has been quite troublesome, both with RenderDoc and some drivers. There seems to be quite a disagreement on how the payload should be declared in the spirv. Specifically, on whether it must be a struct (similarly to a buffer block) or can also just be a plain
uint
.In the spirv spec I can't find anything specified about it.
The vulkan spec states:
Note the can, it is never stated as a firm requirement. Also, this particular code does not actually compile with glslc due to a missing
struct
.glslc
happily compiles the payload as a struct or a plainuint
, and generates spirv that matches the source code:payload as struct:
payload as uint:
=> Thus I assume it's fine to declare payloads both as a struct or as a plain uint
However, RenderDoc and a few drivers do not seem to comply with this statement:
Explanations:
struct { uint id; }
, see above for codeuint
, see above for codeCould you also help me with where I should report the Windows AMD driver bug? Thanks :D
Steps to reproduce
Environment