KhronosGroup / Vulkan-Docs

The Vulkan API Specification and related tools
Other
2.79k stars 466 forks source link

WAR hazard question #1386

Open konstantinegorov opened 3 years ago

konstantinegorov commented 3 years ago

The specification states:

Write-after-read hazards can be solved with just an execution dependency

Suppose there is a sequence of memory accesses:

color_attachment_write -> fragment_shader_read -> compute_shader_read_write

Would it be sufficient to set a memory dependency between color_attachment_write -> fragment_shader_read and then just an execution dependency between fragment_shader_read -> compute_shader_read_write to ensure that the memory writes made by the color attachment output stage are visible to the compute shader?

// in the example, the image is always in VK_IMAGE_LAYOUT_GENERAL, so a global memory barrier will be used
to set the image memory dependency.

... // write to the image in a render pass

VkMemoryBarrier memoryBarrier = {
  ...
  .srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT,
  .dstAccessMask = VK_ACCESS_SHADER_READ_BIT };

vkCmdPipelineBarrier(
    ...
    VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, // srcStageMask
    VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT, // dstStageMask
    1,                                    // memoryBarrierCount
    &memoryBarrier,                       // pMemoryBarriers
    ...);

... // read the image in a fragment shader

vkCmdPipelineBarrier(
    ...
    VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT, // srcStageMask
    VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT, // dstStageMask
    ...);

... // write to the image in a compute shader
jzulauf-lunarg commented 3 years ago

Assuming this is on a single queue with both GRAPHICS and COMPUTE...

With the first barrier, we know that the memory is available (has completed write retirement and cache) and visible before the fragment shader reads are allowed to execute, avoiding the RAW hazard.

The second barrier waits for the last fragment shader read to complete, and thus avoiding the WAR in the compute shader.

This means that it would be completely safe to write to the resource from the compute shader.

This next part I'm less sure about...

While we know the memory is available and visible to the fragment shader and that the fragment shader is done reading it, we don't know the visibility or availability of the memory to the compute shader. The two barriers do form a dependency chain, the but second access scope of the second barrier is empty. That would imply it would not be safe to perform read or read-modify-write operations in the compute shader, as we don't know that the writes from COLOR_ATTACHMENT are visible to be read from the compute shader.

@sfricke-samsung @jeffbolznv @Tobski

Tobski commented 3 years ago

Would it be sufficient to set a memory dependency between color_attachment_write -> fragment_shader_read and then just an execution dependency between fragment_shader_read -> compute_shader_read_write to ensure that the memory writes made by the color attachment output stage are visible to the compute shader?

Yes.

While we know the memory is available and visible to the fragment shader and that the fragment shader is done reading it, we don't know the visibility or availability of the memory to the compute shader. The two barriers do form a dependency chain, the but second access scope of the second barrier is empty. That would imply it would not be safe to perform read or read-modify-write operations in the compute shader, as we don't know that the writes from COLOR_ATTACHMENT are visible to be read from the compute shader.

Availability is "device wide" with a memory barrier - it's either available to all stages on the device or it's not. The first memory barrier here made it available to all stages, then visible to reads in the FRAGMENT stage. No av/vis ops are then needed to make the second write legal.

jzulauf-lunarg commented 3 years ago

@Tobski

re: device domain availability... there may be some spec language that needs to be clarified. Domain for availability seems as important as scopes.

Availability operations cause the values generated by specified memory write accesses to become available to a memory domain for future access. Any available value remains available until a subsequent write to the same memory location occurs (whether it is made available or not) or the memory is freed. Memory domain operations cause writes that are available to a source memory domain to become available to a destination memory domain (an example of this is making writes available to the host domain available to the device domain). Visibility operations cause values available to a memory domain to become visible to specified memory accesses.

Doesn't imply a specific domain. The descriptions in 6.7 don't specify the domain for availability for memory barriers.

What I'm more concern about here (and also unsure about) is visibility.

Two bits concern me:

Once written values are made visible to a particular type of memory access, they can be read or written by that type of memory access. Most synchronization commands in Vulkan define a memory dependency.

So for either read or write, memory must be visible (though I'm not sure why memory must be visible for a write). However the description of "visible to whom" seems more restrictive than you're implying, as just a bit further down we have:

As the compute shader read/write would not be in b', this would imply that the writes from VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT would not be within the guarantee.

Note also that "made available" doesn't qualify the domain at this point either.

Right now, my sync model is assuming what you've said above, so I'd prefer this all to be a documentation issue. (less work for me :) )

Tobski commented 3 years ago

TL;DR - what I've said above still applies, regardless of the spec warts.

Doesn't imply a specific domain. The descriptions in 6.7 don't specify the domain for availability for memory barriers.

That's fair, though it is spelled out in the memory model appendix in the description of memory domains - it states that these are in the device domain (except for host flags). We should update the language in the sync chapter to match, but that's a separate issue.

So for either read or write, memory must be visible (though I'm not sure why memory must be visible for a write).

Hmm, they don't need to be visible for a write, that's a mistake... and it's also directly in conflict with other parts of the spec. We should fix that.

As the compute shader read/write would not be in b', this would imply that the writes from VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT would not be within the guarantee.

They don't need to be in b', because there's no read in the second compute shader, and writes don't need to be visible.

I suspect we're starting to simply confuse the original question at this point - can you perhaps raise a separate issue about these points in the spec?

konstantinegorov commented 3 years ago

Do I understand correctly that flags for writing are pointless in dstAccessMask and will be ignored by implementation?

krOoze commented 2 years ago

@konstantinegorov I think what is meant here is that write-after-read hazards are not hazards. Write-after-write hazards still need memory dependency. They do not need the READ_ACCESS flag in the src. They would not need the WRITE_ACCESS in dst but only if there was no write ever made to the memory before (even before any intermittent reads). Otherwise WRITE_ACCESS is needed to prevent the write-after-write hazard. @Tobski care to clarify your previous comment?

Anyway this Issue seems somewhat resolved, but to recapitulate:

Would it be sufficient to set a memory dependency between color_attachment_write -> fragment_shader_read and then just an execution dependency between fragment_shader_read -> compute_shader_read_write to ensure that the memory writes made by the color attachment output stage are visible to the compute shader?

Memory dependency between color_attachment_write -> fragment_shader_read makes memory available-from color_attachment_write and then it makes that available memory visible-to fragment_shader_read.

Execution dependency between fragment_shader_read -> compute_shader_read_write forms an dependency chain that ensure the availability operation happens before compute_shader. As always all available memory is made visible-to, in this case to compute_shader_read_write.