KhronosGroup / Vulkan-Docs

The Vulkan API Specification and related tools
Other
2.77k stars 465 forks source link

Spec seems to indicate it's impossible to synchronize an upload to the GPU #2386

Closed litherum closed 3 months ago

litherum commented 3 months ago

For the purpose of uploading data to a resource, the CPU writes data before the device reads it. Given the definitions in the Execution and Memory Dependencies section, this kind of operation would presumably require a memory dependency. However, it appears that none of the mechanisms in Chapter 7. Synchronization and Cache Control are suitable to synchronize this use case.

Either the spec should be clarified describing how to synchronize a memory upload correctly, or a note should be added indicating that memory uploads do not need to be synchronized.

Fences

The functions that interact with fences (without extensions) are:

None of these allow for the device to wait on the fence from within a command buffer.

Also, from the types involved with vkQueueSubmit() and vkQueueSubmit2(), it doesn't appear to be possible for the device to wait on the fence from outside a command buffer either.

Semaphores

The spec text for a semaphore signal operation says (emphasis mine):

The first access scope includes all memory access performed by the device.

When uploading data to the card, the CPU, rather than the device, performs the write. Therefore, if it isn't included in the first access scope, semaphores cannot form a memory dependency here.

Events

The spec text for events says:

Events must not be used to insert a dependency between commands submitted to different queues.

So it seems unlikely that it would be able to synchronize between the host and the device.

The spec text for vkSetEvent() doesn't list its synchronization scope or access scope, so it's unclear whether it forms a memory or execution dependency.

Pipeline Barriers

The spec text for vkCmdPipelineBarrier2() says (emphasis mine):

When vkCmdPipelineBarrier2 is submitted to a queue, it defines memory dependencies between commands that were submitted to the same queue before it, and those submitted to the same queue after it.

When uploading data to the card, the CPU's writes are not performed by a device queue, so pipeline barriers cannot form a memory dependency here.

(Confusingly, VK_PIPELINE_STAGE_2_HOST_BIT is a member of VkPipelineStageFlagBits2 and VK_ACCESS_2_HOST_READ_BIT/VK_ACCESS_2_HOST_WRITE_BIT are members of VkAccessFlagBits2 which makes it seem like pipeline barriers can synchronize with the host.)

Comparison with vkcube

Assuming the demo takes the codepath where the device samples from the linear texture, after demo_prepare_texture_image() writes the data to the mapped resource,

  1. It calls vkUnmapMemory(). However, neither the spec text for vkUnmapMemory() nor vkUnmapMemory2KHR() say anything about synchronization.
  2. It then runs a pipeline barrier, where the source stage mask is VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT and the source access mask is 0.

The spec text for VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT says:

VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT ... specifies no stage of execution when specified in the first scope.

This indicates the pipeline barrier must only be used to update the layout of the image from VK_IMAGE_LAYOUT_PREINITIALIZED to VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL. So the vkcube demo seems to not create any memory dependencies for the texture upload.

Hugobros3 commented 3 months ago

vkcube source code mentions VK_MEMORY_PROPERTY_HOST_COHERENT_BIT and I believe you missed https://registry.khronos.org/vulkan/specs/1.3-khr-extensions/html/chap11.html#vkFlushMappedMemoryRanges , as well as other types of device memory barriers. The spec has this to say:

vkFlushMappedMemoryRanges guarantees that host writes to the memory ranges described by pMemoryRanges are made available to the host memory domain, such that they can be made available to the device memory domain via memory domain operations using the VK_ACCESS_HOST_WRITE_BIT access type.

additionally:

Queue submission commands automatically perform a domain operation from host to device for all writes performed before the command executes, so in most cases an explicit memory barrier is not needed for this case. In the few circumstances where a submit does not occur between the host write and the device read access, writes can be made available by using an explicit memory barrier.

devshgraphicsprogramming commented 3 months ago

Maybe you've accidentally missed the implicit synchronization guarantee section/chapter that kind of explicitly mentions that queue submits automatically form a memory dependency on any host write that's done on coherent mapped memory and if the memory is not coherent the relevant ranges only need to be flushed.

devshgraphicsprogramming commented 3 months ago

I recommend reading the memory model appendix.

To form a memory dependency one also needs an execution dependency (which itself can be transient).

Pipeline barrier having host flags on the src stage or mask make no sense because the submit already has them implicitly. A pipeline barrier that depends on host will always depend on it via a submit.

It's different for an event because event can be set and awaited by host, but you should never submit a commandbuffer with an event wait for which the set has not already been either submitted or done by the host. So it only makes sense to have dst flags as host.

litherum commented 3 months ago

Ah, I missed Section 7.9 Host Write Ordering Guarantees. Thanks!