gfx-rs / wgpu

A cross-platform, safe, pure-Rust graphics API.
https://wgpu.rs
Apache License 2.0
12.16k stars 893 forks source link

wgpu-core might not submit commands to hal queue with strictly increasing submission indices #5978

Open jimblandy opened 1 month ago

jimblandy commented 1 month ago

For correctness, submissions to wgpu_hal::Queue::submit must have strictly increasing fence values. As explained in the docs for wgpu_hal::Api::Fence:

Waiting on a fence returns as soon as the fence reaches or passes the requested value. This implies that, in order to reliably determine when an operation has completed, operations must finish in order of increasing fence values: if a higher-valued operation were to finish before a lower-valued operation, then waiting for the fence to reach the lower value could return before the lower-valued operation has actually finished.

However, wgpu_core::Global::queue_submit doesn't correctly enforce this rule. It allocates a submission index early in the function, then does a bunch of validation, and then finally submits the validated command buffers to the hal queue. If two threads were to call queue_submit simultaneously, the second thread to allocate a submission index (and thus, the one with the higher submission index) might nonetheless be the first thread to submit its commands to hal, meaning that the other thread would submit its commands later, under the lower submission index. Any attempt to wait for the second thread's commands' completion would be unreliable, since that fence value would be considered "signaled" as soon as the first thread's commands were done.

To fix this, it would be sufficient to make allocating a submission index and submitting the commands both parts of the same critical section. This is stricter than necessary - all we need is for submissions to proceed in the same order as index allocations - but it's not clear that the additional flexibility is valuable, and it's not clear to me what synchronization primitives would enforce it.

It may be possible to remove wgpu_core::resource::TrackingData::submission_index. This would save time spent marking resources at submission time (deletion could search the active submission list for submissions using the resource instead), and let us move submission index allocation immediately before hal queue submission, narrowing the critical region.

teoxoy commented 1 month ago

It may be possible to remove wgpu_core::resource::TrackingData::submission_index. This would save time spent marking resources at submission time (deletion could search the active submission list for submissions using the resource instead), and let us move submission index allocation immediately before hal queue submission, narrowing the critical region.

https://github.com/gfx-rs/wgpu/pull/5976 takes care of this.

teoxoy commented 1 month ago

https://github.com/gfx-rs/wgpu/pull/5976 was just an improvement to get us closer to this but didn't solve it.