baldurk / renderdoc

RenderDoc is a stand-alone graphics debugging tool.
https://renderdoc.org
MIT License
8.66k stars 1.31k forks source link

VK_ERROR_DEVICE_LOST for vulkanExamples shaderobjects #3371

Closed eusapia closed 1 week ago

eusapia commented 3 weeks ago

Description

Captures for programs using VK_EXT_shader_object are flaky.

I've seen a fair amount of instability with RenderDoc and the shader objects extension (including a number of recovered driver crashes, the program being traced hanging, lost device errors, etc). I know this is a very new feature and I'm glad to see it developed.

Steps to reproduce

I've been trying to root-cause this in my own code, but as it turns out it happens in the "shaderobjects" example from https://github.com/SaschaWillems/Vulkan.git

  1. Run shaderobjects.exe under RenderDoc.
  2. Capture a frame.
  3. Select the second vkQueueSubmit() event.

The bug reporter dialog pops up. At the end of the log are:

RDOC 014036: [17:05:22]          vk_core.cpp(4748) - Error   - Logging device lost fatal error for VK_ERROR_DEVICE_LOST
RDOC 014036: [17:05:22] replay_controller.cpp(1969) - Log     - Fatal error detected: Encountered a GPU device lost error (Logging device lost fatal error for VK_ERROR_DEVICE_LOST) at event 119

To reduce confounding factors, I made sure to delete the .bin files the example program creates before running the capture (these are afaik cached shader code).

Environment

RenderDoc version: RenderDoc_2024_07_02_0406d376_64 Operating System: Microsoft Windows [Version 10.0.19045.4529] Graphics API: Vulkan 1.3.275 Graphics Adapter: GeForce GTX 1080 Driver Version: 566.12

eusapia commented 3 weeks ago

Related to #3355

Zorro666 commented 2 weeks ago

Hi

Are you able to share the RenderDoc capture file that is not working?

I have tried using Sascha Williams Vulkan Sample shaderobjects.exe and Vulkan Samples shader_object and both capture and replay without any problems.

Perhaps the difference in the driver and hardware. I am capturing and replaying on "NVIDIA GeForce RTX 4080 Laptop GPU (ver 556.12 patch 0x0) - 10de:27e0"

Here is a screenshot showing the Sascha Vulkan Sample

image

eusapia commented 2 weeks ago

Just ran this capture from absolute scratch (still shaderobjects from Sascha Williams):

error.log shaderobjects_2024.07.04_17.43.18_frame7149.zip

RenderDoc consistently shows the lost device error dialog when I select that second vkQueueSubmit event.

eusapia commented 2 weeks ago

I don't get the error when I capture on a RTX 4090 (/Win11, otherwise just updated the Vulkan SDK and nvidia drivers to 556.12.0). The capture from the GTX 1080 does not open on the 4090 ("Tried to allocate memory from index 7, but on replay we only have 5 memory types.")

eusapia commented 2 weeks ago

For completeness I tried on a GTX 1050 (mobile), but it looks like that doesn't support the extension.

eusapia commented 2 weeks ago

shader_object from Vulkan Samples also pops the device lost error (RenderDoc_2024_07_08_9ebd796c_64; select "Colour Pass #2" EID 216-230. This is on the longsuffering GTX 1080.)

RenderDoc_2024.07.09_18.07.19.log vulkan_samples_2024.07.09_22.07.23_frame0.zip

Zorro666 commented 2 weeks ago

This looks to be a problem specific to nVidia 10 series cards.

Zorro666 commented 1 week ago

Thanks to the original contributors and maintainers of the EXT_ShaderObject extension, commit 8986329d1e6769a3379bd4dbe2a17f034a35040e implements a workaround to support the feature for nVidia Series 10 cards.

Zorro666 commented 1 week ago

The fix will be available in the next nightly build

eusapia commented 1 week ago

That fixed it, thanks!