KhronosGroup / Vulkan-Docs

The Vulkan API Specification and related tools
Other
2.7k stars 452 forks source link

Consider adding happens-before for indirect argument consumption. #2368

Closed Kangz closed 1 month ago

Kangz commented 1 month ago

We're interested in having a guarantee that the indirect parameters to commands such as vkCmdDispatchWorkgroupsIndirect happens-before any workgroup is launched. This would make it possible to modify these parameters in the compute shader being dispatched itself, without data races.

However looking at the vulkan specification it seems that right now the only dependency is VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT's end happening-before VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT's end (we would like this to be the start instead).

Am I missing something? Would it be possible to add this dependency to the spec?

awalters-vk-img commented 1 month ago

Discussed in the working group - we're concerned this would be potentially dangerous on various implementations, but we'd be interested to hear more about the usage scenarios you envisage for this change.

Kangz commented 1 month ago

The scenario is https://github.com/gpuweb/gpuweb/issues/4632

WebGPU prevents data races (as best it can) and right now doesn't allow a buffer to be used as both an INDIRECT and STORAGE (read-write) usage in the same dispatchWorkgroupsIndirect. Of course if the ranges of buffers used are disjoint, the usage could be data-race-free but WebGPU only does per-buffer tracking (and I assume most engines would just bind the whole buffer as storage anyway, so even range tracking wouldn't help).

Being able to modify a buffer while using it for indirect dispatch parameters is useful when chaining multiple indirect dispatches one after the other. @brendan-duncan ran into this in some workloads.

brendan-duncan commented 1 month ago

Yes, Unity runs into this situation in a few places, where a buffer is used for both the indirect buffer, and is updated by the compute kernel that is dispatched indirectly. Due to the validation checking WebGPU has to do to try and prevent race conditions on the buffer data, we currently have to make a duplicate copy of the buffer for the indirect usage. Given this scenario is inherently race free, having Vulkan (and subsequently WebGPU) designate it as guaranteed would help us improve performance of the Unity WebGPU driver. The other drivers in Unity don't have to do anything special for this case as Vulkan and other APIs don't guard against this behavior.

HansKristian-Work commented 1 month ago

Do you have to actually overwrite the indirect arguments directly at the exact same byte offset, or is it just that the VkBuffer itself is used for both indirect and storage? In D3D12 before enhanced barriers, this would not be allowed for example, since a buffer cannot be in both a writable and read-only state at the same time. In Vulkan, there is no problem using a VkBuffer like this, but overwriting the exact inputs while executing from it is very dicey, and very likely a race condition on real hardware. There is no requirement that command processors "consume" the indirect buffer as a copy before any threads begin executing.

Kangz commented 1 month ago

It is just that the VkBuffer is used for both indirect and storage at the same time. The application isn't necessarily trying to do a race on the indirect parameters but WebGPU want to prevent "bad" races and wouldn't allow the race if it is an issue on real hardware. Which you seemed to hint at?

Thank you for pointing that usage transitions in D3D12 wouldn't support that by default anyway, we had forgotten to look at that other API.

HansKristian-Work commented 1 month ago

Can this issue be closed? Seems like it cannot be supported portably anyway.

brendan-duncan commented 1 month ago

If it's something we can't do anything about, then sure it can be closed. Thank you for looking into it.