Closed Kangz closed 1 month ago
Discussed in the working group - we're concerned this would be potentially dangerous on various implementations, but we'd be interested to hear more about the usage scenarios you envisage for this change.
The scenario is https://github.com/gpuweb/gpuweb/issues/4632
WebGPU prevents data races (as best it can) and right now doesn't allow a buffer to be used as both an INDIRECT
and STORAGE
(read-write) usage in the same dispatchWorkgroupsIndirect
. Of course if the ranges of buffers used are disjoint, the usage could be data-race-free but WebGPU only does per-buffer tracking (and I assume most engines would just bind the whole buffer as storage anyway, so even range tracking wouldn't help).
Being able to modify a buffer while using it for indirect dispatch parameters is useful when chaining multiple indirect dispatches one after the other. @brendan-duncan ran into this in some workloads.
Yes, Unity runs into this situation in a few places, where a buffer is used for both the indirect buffer, and is updated by the compute kernel that is dispatched indirectly. Due to the validation checking WebGPU has to do to try and prevent race conditions on the buffer data, we currently have to make a duplicate copy of the buffer for the indirect usage. Given this scenario is inherently race free, having Vulkan (and subsequently WebGPU) designate it as guaranteed would help us improve performance of the Unity WebGPU driver. The other drivers in Unity don't have to do anything special for this case as Vulkan and other APIs don't guard against this behavior.
Do you have to actually overwrite the indirect arguments directly at the exact same byte offset, or is it just that the VkBuffer itself is used for both indirect and storage? In D3D12 before enhanced barriers, this would not be allowed for example, since a buffer cannot be in both a writable and read-only state at the same time. In Vulkan, there is no problem using a VkBuffer like this, but overwriting the exact inputs while executing from it is very dicey, and very likely a race condition on real hardware. There is no requirement that command processors "consume" the indirect buffer as a copy before any threads begin executing.
It is just that the VkBuffer is used for both indirect and storage at the same time. The application isn't necessarily trying to do a race on the indirect parameters but WebGPU want to prevent "bad" races and wouldn't allow the race if it is an issue on real hardware. Which you seemed to hint at?
Thank you for pointing that usage transitions in D3D12 wouldn't support that by default anyway, we had forgotten to look at that other API.
Can this issue be closed? Seems like it cannot be supported portably anyway.
If it's something we can't do anything about, then sure it can be closed. Thank you for looking into it.
We're interested in having a guarantee that the indirect parameters to commands such as
vkCmdDispatchWorkgroupsIndirect
happens-before any workgroup is launched. This would make it possible to modify these parameters in the compute shader being dispatched itself, without data races.However looking at the vulkan specification it seems that right now the only dependency is
VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT
's end happening-beforeVK_PIPELINE_STAGE_COMPUTE_SHADER_BIT
's end (we would like this to be the start instead).Am I missing something? Would it be possible to add this dependency to the spec?