Optimizes compute-to-compute barriers so that write-to-read memory barriers are only applied to the buffer being written to. The read buffer does not require a memory barrier - this PR eliminates that unnecessary barrier.
Adjusts the graphics-to-compute barrier to use VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT instead of VK_PIPELINE_STAGE_VERTEX_INPUT_BIT, with access mask now set to VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT. The previous settings seemed to work, but I believe these are more correct.
Adds double-buffering capability for the compute queue with significantly increased performance (~50% greater on Windows, >100% greater on Linux, no change on macOS). There have always been 2 compute command buffers created but only 1 has been used up to this point. This adds semaphore sync support for using the 2 compute command buffers which results in parallel execution of the compute and graphics queues. A new COMPUTE_CMD_BUFFERS define is available to switch between double and single buffered operation. Default setting is COMPUTE_CMD_BUFFERS = 2for double buffering, but can be set = 1 for experimentation. See the following AMD GPU Profiler traces for insight.
Changes tested on Windows 10, Manjaro Linux, macOS Ventura 13.6.6 with no validation errors or warnings.
Single buffering / Serial queue operation (COMPUTE_CMD_BUFFERS = 1):
This PR addresses the concerns raised in #1097:
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT
instead ofVK_PIPELINE_STAGE_VERTEX_INPUT_BIT
, with access mask now set toVK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT
. The previous settings seemed to work, but I believe these are more correct.COMPUTE_CMD_BUFFERS
define is available to switch between double and single buffered operation. Default setting isCOMPUTE_CMD_BUFFERS = 2
for double buffering, but can be set= 1
for experimentation. See the following AMD GPU Profiler traces for insight.Changes tested on Windows 10, Manjaro Linux, macOS Ventura 13.6.6 with no validation errors or warnings.
Single buffering / Serial queue operation (
COMPUTE_CMD_BUFFERS = 1
):Double buffering / Parallel queue operation (
COMPUTE_CMD_BUFFERS = 2
):