SaschaWillems / Vulkan

C++ examples for the Vulkan graphics API
MIT License
10.18k stars 2.02k forks source link

computecloth: Optimize barriers and add compute queue double buffering #1128

Open SRSaunders opened 4 months ago

SRSaunders commented 4 months ago

This PR addresses the concerns raised in #1097:

  1. Optimizes compute-to-compute barriers so that write-to-read memory barriers are only applied to the buffer being written to. The read buffer does not require a memory barrier - this PR eliminates that unnecessary barrier.
  2. Adjusts the graphics-to-compute barrier to use VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT instead of VK_PIPELINE_STAGE_VERTEX_INPUT_BIT, with access mask now set to VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT. The previous settings seemed to work, but I believe these are more correct.
  3. Adds double-buffering capability for the compute queue with significantly increased performance (~50% greater on Windows, >100% greater on Linux, no change on macOS). There have always been 2 compute command buffers created but only 1 has been used up to this point. This adds semaphore sync support for using the 2 compute command buffers which results in parallel execution of the compute and graphics queues. A new COMPUTE_CMD_BUFFERS define is available to switch between double and single buffered operation. Default setting is COMPUTE_CMD_BUFFERS = 2for double buffering, but can be set = 1 for experimentation. See the following AMD GPU Profiler traces for insight.

Changes tested on Windows 10, Manjaro Linux, macOS Ventura 13.6.6 with no validation errors or warnings.

Single buffering / Serial queue operation (COMPUTE_CMD_BUFFERS = 1): compute-queue-1cb-summary compute-queue-1cb-wavefront

Double buffering / Parallel queue operation (COMPUTE_CMD_BUFFERS = 2): compute-queue-2cb-summary compute-queue-2cb-wavefront