Open simonask opened 1 year ago
Thank you for filling! We really need to cover this with more tests.
All of this makes sense and is not a bug in wgpu. I
Disagree, validation should catch this and cause an error cpu side instead of just deadlocking. While deadlocking is allowed it is stupid hard to debug, and we should improve this experience as much as possible.
I was trying to benchmark some compute shaders (Vulkan) and ran into a deadlock in Bevy.
The deadlock turned out to be due to a call to
vkWaitForFences
never returning while holding the exclusive device lock.The Vulkan validation layer indicated some buffer mapping errors that weren't very obviously related (but are).
The root cause of the issue turned out to be my combination of
write_timestamp
andresolve_query_set
. You see, thequery_range
parameter toresolve_query_set
really matters. Providing a range that covers parts of the query set that weren't actually written to bywrite_timestamp
(and I assue other types of queries as well) implies that a memory fence is created for each location, which then never resolves. In other word, the hardware is waiting for an event that never happens.All of this makes sense and is not a bug in wgpu. I would even say that it is expected once you see how it works. I would just have saved a lot of time if there had been a comment in the documentation for
CommandEncoder::resolve_query_set()
mentioning that you really need to keep track of the indices in the query set that are being written to, and provide the appropriate range(s) toresolve_query_set
, and nothing more.Pseudo-repro:
In my case, the hang was observed when Bevy tried to acquire a surface texture for rendering. Something in that codepath has a call to
vkWaitForFences()
, which makes sense. But it just took me a while to narrow it down to the call toresolve_query_set
.