Open Firestar99 opened 1 month ago
Hello @Firestar99 , Thanks for the bug report.
I tried to reproduce the issue with new_amdvlk_freeze.rdc
and new_radv_with_amdvlk_installed_gpu_reset.rdc
on NAVI21, but neither GPU resets nor system freezes are observerd.
=> My current conclusion is that opening a RADV capture and an amdvlk device being available, even though it is unused, is enough to cause the Renderdoc to freeze and a gpu reset to follow.
Can you please confirm above conclusion by export VK_DRIVER_FILES to force the loader to use radv or amdvlk? For example, if the env is not set, we can see both radv and amdvlk devices:
INFO | DRIVER: linux_read_sorted_physical_devices:
INFO | DRIVER: Original order:
INFO | DRIVER: [0] AMD Radeon RX 6800 (RADV NAVI21)
INFO | DRIVER: [1] llvmpipe (LLVM 15.0.7, 256 bits)
INFO | DRIVER: [2] AMD Radeon RX 6800
INFO | DRIVER: Sorted order:
INFO | DRIVER: [0] AMD Radeon RX 6800 (RADV NAVI21)
INFO | DRIVER: [1] AMD Radeon RX 6800
INFO | DRIVER: [2] llvmpipe (LLVM 15.0.7, 256 bits)
once you specify the radv json: export VK_DRIVER_FILES=/usr/share/vulkan/icd.d/radeon_icd.x86_64.json, you can only see radv device
INFO | DRIVER: linux_read_sorted_physical_devices:
INFO | DRIVER: Original order:
INFO | DRIVER: [0] AMD Radeon RX 6800 (RADV NAVI21)
INFO | DRIVER: Sorted order:
INFO | DRIVER: [0] AMD Radeon RX 6800 (RADV NAVI21)
Opening a RenderDoc Capture that has task and mesh shaders which utilize a payload to send data between them causes GPU resets and system freezes. I have found two very different ways of triggering it, both of them somehow related to amdvlk. These repo instructions assume a clean Ubuntu 24.04 system to start, so only RADV and no AMDVLK installed.
Standard AMDVLK Capture
vkCmdDrawMeshTasksEXT
callOpening a RADV Capture while AMDVLK is just present but unused
/etc/vulkan/implicit_layer.d/amd_icd64.json
to remove theVK_LAYER_AMD_switchable_graphics_64
implicit layer, which forces you to always use the amdvlk drivervulkanCapsViewer
can see both drivers, RADV withAMD Radeon Graphics (RADV REMBRANDT)
and amdvlk withAMD Radeon Graphics
(I wish amdvlk had a more identifiable name)=> My current conclusion is that opening a RADV capture and an amdvlk device being available, even though it is unused, is enough to cause the Renderdoc to freeze and a gpu reset to follow.
RenderDoc log in case you want to confirm that RenderDoc indeed uses RADV as the replay device, and AMDVLK just being present.
Related issues
https://github.com/baldurk/renderdoc/issues/3309 https://gitlab.freedesktop.org/mesa/mesa/-/issues/11156