HansKristian-Work / vkd3d-proton

Fork of VKD3D. Development branches for Proton's Direct3D 12 implementation.
GNU Lesser General Public License v2.1
1.82k stars 190 forks source link

Mark all collection state objects created with DEPENDENCIES_ON_EXTERNAL_DEFINITIONS as deferred #1840

Closed esullivan-nvidia closed 7 months ago

esullivan-nvidia commented 7 months ago

This fixes a GPU hang with Warhammer 40,000: Darktide on NVIDIA GPUs. Currently this title creates state object collections with the D3D12_STATE_OBJECT_FLAG_ALLOW_LOCAL_DEPENDENCIES_ON_EXTERNAL_DEFINITIONS flag set that can successfully be compiled to a Vulkan RT pipeline. Later on the game creates another PSO with additional state that needs to be included in the resulting Vulkan pipeline.

This doesn't work with the current vkd3d-proton implementation of DEPENDENCIES_ON_EXTERNAL_DEFINITIONS because it expects all pipelines marked with this flag to fail the translation process into a Vulkan pipeline. This change updates d3d12_state_object_init to always mark state object collections created with DEPENDENCIES_ON_EXTERNAL_DEFINITIONS as deferred, even if it resulted in a valid pipeline object.

From what I can see I think the game will only use this path if it detects nvapi is available, so if you try to reproduce the crash, keep that in mind.

Here is a link to the original bug report for Darktide on the NVIDIA forums: https://forums.developer.nvidia.com/t/multiple-cuda-rtx-vulkan-application-crashing-with-xid-13-109-errors/235459/291

HansKristian-Work commented 7 months ago

I don't quite understand what this is actually solving and why, so I'll have to investigate locally. Thanks for root causing.

HansKristian-Work commented 7 months ago

I'm still getting a DEVICE_LOST though. There are more errors in the log when setting:

diff --git a/libs/vkd3d/raytracing_pipeline.c b/libs/vkd3d/raytracing_pipeline.c
index 44cf9fb1..c670fbb0 100644
--- a/libs/vkd3d/raytracing_pipeline.c
+++ b/libs/vkd3d/raytracing_pipeline.c
@@ -20,7 +20,7 @@
 #include "vkd3d_private.h"
 #include "vkd3d_string.h"

-#define RT_TRACE TRACE
+#define RT_TRACE INFO
02f0:err:d3d12_state_object_properties_GetShaderIdentifier: Could not find entry point.
02f0:info:d3d12_state_object_properties_GetShaderIdentifier: iface 0000000068b3bf28, export_name "hit_group_cc0946d8".
02f0:err:d3d12_state_object_properties_GetShaderIdentifier: Could not find entry point.
02f0:info:d3d12_state_object_properties_GetShaderIdentifier: iface 0000000068b3bf28, export_name "hit_group_a653d41d".
02f0:err:d3d12_state_object_properties_GetShaderIdentifier: Could not find entry point.
02f0:info:d3d12_state_object_properties_GetShaderIdentifier: iface 0000000068b3bf28, export_name "hit_group_54a24236".
02f0:err:d3d12_state_object_properties_GetShaderIdentifier: Could not find entry point.
02f0:info:d3d12_state_object_properties_GetShaderIdentifier: iface 0000000068b3bf28, export_name "hit_group_af3f5b8c".
02f0:err:d3d12_state_object_properties_GetShaderIdentifier: Could not find entry point.
02f0:info:d3d12_state_object_properties_GetShaderIdentifier: iface 0000000068b3bf28, export_name "raygen_main".
02f0:info:d3d12_state_object_properties_GetShaderIdentifier:   chaining into collection 0.
02f0:info:d3d12_state_object_properties_GetShaderIdentifier:   identifier { 037c000000000016, 000000000000ffff, 0000000000000000, 0000000000000000 }
02f0:info:d3d12_state_object_properties_GetShaderIdentifier: iface 0000000068b3bf28, export_name "miss_main".
02f0:info:d3d12_state_object_properties_GetShaderIdentifier:   chaining into collection 0.
02f0:info:d3d12_state_object_properties_GetShaderIdentifier:   identifier { 03bc000000000015, 000000000000ffff, 0000000000000000, 0000000000000000 }
02f0:info:d3d12_state_object_properties_GetShaderIdentifier: iface 0000000068b3bf28, export_name "shadow_miss_main".
02f0:info:d3d12_state_object_properties_GetShaderIdentifier:   chaining into collection 0.
02f0:info:d3d12_state_object_properties_GetShaderIdentifier:   identifier { 03bc000000000020, 000000000000ffff, 0000000000000000, 0000000000000000 }
0380:info:vkd3d_pipeline_library_disk_thread_main: Flushing disk cache (wakeup counter since last flush = 80). It seems like application has stopped creating new PSOs for the time being.
03a4:err:d3d12_command_queue_bind_sparse: Failed to perform sparse binding, vr -4.
03a4:err:d3d12_command_queue_bind_sparse: Failed to submit wait, vr -4.
03a4:err:d3d12_command_queue_bind_sparse: Failed to submit signal, vr -4.
03a4:err:d3d12_command_queue_bind_sparse: Failed to perform sparse binding, vr -4.
03a4:err:d3d12_command_queue_bind_sparse: Failed to submit wait, vr -4.
03a4:err:d3d12_command_queue_signal: Failed to submit signal operation, vr -4.
03a4:err:d3d12_command_queue_execute: Failed to submit queue(s), vr -4.
03a4:err:d3d12_command_queue_execute: Failed to submit queue(s), vr -4.
03a4:err:d3d12_command_queue_execute: Failed to submit queue(s), vr -4.
03a4:err:dxgi_vk_swap_chain_present_signal_blit_semaphore: Failed to submit present discard, vr = -4.
03a4:err:d3d12_command_queue_signal: Failed to submit signal operation, vr -4.

On master it's:

0294:info:d3d12_state_object_properties_GetShaderIdentifier:   identifier { 03dc000000000096, 000000000000ffff, 0000000000000000, 0000000000000000 }
0294:info:d3d12_state_object_properties_GetShaderIdentifier: iface 0000000101e685d8, export_name "hit_group_4ca1c3cc".
0294:info:d3d12_state_object_properties_GetShaderIdentifier:   chaining into collection 201.
0294:info:d3d12_state_object_properties_GetShaderIdentifier:   identifier { 03dc00000000009a, 000000000000ffff, 0000000000000000, 0000000000000000 }
0294:info:d3d12_state_object_properties_GetShaderIdentifier: iface 0000000101e685d8, export_name "raygen_main".
0294:info:d3d12_state_object_properties_GetShaderIdentifier:   chaining into collection 0.
0294:info:d3d12_state_object_properties_GetShaderIdentifier:   identifier { 037c000000000016, 000000000000ffff, 0000000000000000, 0000000000000000 }
0294:info:d3d12_state_object_properties_GetShaderIdentifier: iface 0000000101e685d8, export_name "miss_main".
0294:info:d3d12_state_object_properties_GetShaderIdentifier:   chaining into collection 0.
0294:info:d3d12_state_object_properties_GetShaderIdentifier:   identifier { 03bc000000000015, 000000000000ffff, 0000000000000000, 0000000000000000 }
0294:info:d3d12_state_object_properties_GetShaderIdentifier: iface 0000000101e685d8, export_name "shadow_miss_main".
0294:info:d3d12_state_object_properties_GetShaderIdentifier:   chaining into collection 0.
0294:info:d3d12_state_object_properties_GetShaderIdentifier:   identifier { 03bc000000000020, 000000000000ffff, 0000000000000000, 0000000000000000 }
0330:info:vkd3d_pipeline_library_disk_thread_main: Pipeline cache marked dirty. Flush is scheduled.
0330:info:vkd3d_pipeline_library_disk_thread_main: Flushing disk cache (wakeup counter since last flush = 22). It seems like application has stopped creating new PSOs for the time being.
0354:err:d3d12_command_queue_bind_sparse: Failed to perform sparse binding, vr -4.
0354:err:d3d12_command_queue_bind_sparse: Failed to submit wait, vr -4.
0354:err:d3d12_command_queue_signal: Failed to submit signal operation, vr -4.
0354:err:d3d12_command_queue_execute: Failed to submit queue(s), vr -4.
0354:err:d3d12_command_queue_execute: Failed to submit queue(s), vr -4.
0354:err:d3d12_command_queue_execute: Failed to submit queue(s), vr -4.
0354:err:dxgi_vk_swap_chain_present_signal_blit_semaphore: Failed to submit present discard, vr = -4.
0354:err:d3d12_command_queue_signal: Failed to submit signal operation, vr -4.
HansKristian-Work commented 7 months ago

With PROTON_ENABLE_NVAPI=0, I can still reproduce a device lost, but it happens at a different place. If I have =0, starting a new game in the tutorial section, I can reliable get a device lost within a few seconds of taking control of the character, but with NVAPI=1, it seems like I get a hang before getting to main menu.

HansKristian-Work commented 7 months ago

This smells like a game bug to me.

Without NVAPI, it seems like the global root signatures used are compatible:

023c:info:d3d12_state_object_get_group_handles: Queried export 0, variant 1, group handle 0 -> { 03bc000000000015, 000000000000ffff, 0000000000000000, 0000000000000000 }
023c:info:d3d12_state_object_get_group_handles: Queried export 1, variant 1, group handle 1 -> { 037c000000000016, 000000000000ffff, 0000000000000000, 0000000000000000 }
023c:info:d3d12_state_object_get_group_handles: Queried export 2, variant 1, group handle 2 -> { 037c000000000018, 000000000000ffff, 0000000000000000, 0000000000000000 }
023c:info:d3d12_state_object_get_group_handles: Queried export 3, variant 1, group handle 3 -> { 037c00000000001a, 000000000000ffff, 0000000000000000, 0000000000000000 }
023c:info:d3d12_state_object_get_group_handles: Queried export 4, variant 1, group handle 4 -> { 03bc000000000020, 000000000000ffff, 0000000000000000, 0000000000000000 }
023c:info:d3d12_state_object_get_group_handles: Queried export 5, variant 1, group handle 5 -> { 03dc000000430043, 000000000000ffff, 0000000000000000, 0000000000000000 }
023c:info:d3d12_state_object_get_group_handles: Queried export 6, variant 1, group handle 6 -> { 03dc000000450045, 000000000000ffff, 0000000000000000, 0000000000000000 }
023c:info:d3d12_state_object_get_group_handles: Queried export 7, variant 1, group handle 7 -> { 03dc000000470047, 000000000000ffff, 0000000000000000, 0000000000000000 }
023c:info:d3d12_state_object_get_group_handles: Queried export 8, variant 1, group handle 8 -> { 03dc000000000049, 000000000000ffff, 0000000000000000, 0000000000000000 }
023c:info:d3d12_state_object_get_group_handles: Queried export 9, variant 1, group handle 9 -> { 03dc0000004a004a, 000000000000ffff, 0000000000000000, 0000000000000000 }
023c:info:d3d12_state_object_get_group_handles: Queried export 10, variant 1, group handle 10 -> { 03dc0000004c004c, 000000000000ffff, 0000000000000000, 0000000000000000 }

however, with NVAPI, it seems like the global root signatures for hit groups and raygen shaders are not compatible. This causes two separate VkPipelines to be linked:

0270:info:d3d12_state_object_get_group_handles: Queried export 0, variant 1, group handle 0 -> { 03bc000000000015, 000000000000ffff, 0000000000000000, 0000000000000000 }
0270:info:d3d12_state_object_get_group_handles: Queried export 1, variant 1, group handle 1 -> { 037c000000000016, 000000000000ffff, 0000000000000000, 0000000000000000 }
0270:info:d3d12_state_object_get_group_handles: Queried export 2, variant 1, group handle 2 -> { 037c000000000018, 000000000000ffff, 0000000000000000, 0000000000000000 }
0270:info:d3d12_state_object_get_group_handles: Queried export 3, variant 1, group handle 3 -> { 037c00000000001a, 000000000000ffff, 0000000000000000, 0000000000000000 }
0270:info:d3d12_state_object_get_group_handles: Queried export 4, variant 1, group handle 4 -> { 03bc000000000020, 000000000000ffff, 0000000000000000, 0000000000000000 }
0270:info:d3d12_state_object_get_group_handles: Queried export 5, variant 2, group handle 0 -> { 03dc000000430043, 000000000000ffff, 0000000000000000, 0000000000000000 }
0270:info:d3d12_state_object_get_group_handles: Queried export 6, variant 2, group handle 1 -> { 03dc000000450045, 000000000000ffff, 0000000000000000, 0000000000000000 }
0270:info:d3d12_state_object_get_group_handles: Queried export 7, variant 2, group handle 2 -> { 03dc000000470047, 000000000000ffff, 0000000000000000, 0000000000000000 }
0270:info:d3d12_state_object_get_group_handles: Queried export 8, variant 2, group handle 3 -> { 03dc000000000049, 000000000000ffff, 0000000000000000, 0000000000000000 }
0270:info:d3d12_state_object_get_group_handles: Queried export 9, variant 2, group handle 4 -> { 03dc0000004a004a, 000000000000ffff, 0000000000000000, 0000000000000000 }
0270:info:d3d12_state_object_get_group_handles: Queried export 10, variant 2, group handle 5 -> { 03dc0000004c004c, 000000000000ffff, 0000000000000000, 0000000000000000 }
0270:info:d3d12_state_object_get_group_handles: Queried export 11, variant 2, group handle 6 -> { 03dc0000004e004e, 000000000000ffff, 0000000000000000, 0000000000000000 }
0270:info:d3d12_state_object_get_group_handles: Queried export 12, variant 2, group handle 7 -> { 03dc000000500050, 000000000000ffff, 0000000000000000, 0000000000000000 }
0270:info:d3d12_state_object_get_group_handles: Queried export 13, variant 2, group handle 8 -> { 03dc000000520052, 000000000000ffff, 0000000000000000, 0000000000000000 }

This smells like an app-bug to me. D3D12 does not allow tracing rays across global root signature boundaries, but it might be a case of "just werks" on native drivers. I'll have to test in more detail what happens on native driver.

esullivan-nvidia commented 7 months ago

Thanks for the quick investigation and feedback. Sorry my initial attempts at a fix ended up being a red herring. The root signature discrepancy between the hit group and ray gen shaders certainly does seem like an app bug. I will talk to our dev tech team about reaching out to Fatshark. If you would like I can go ahead and close out this PR.

HansKristian-Work commented 7 months ago

Fixed in https://github.com/HansKristian-Work/vkd3d-proton/pull/1843. D3D12 drivers are somewhat robust against these bugs, so we have to be as well.