gfx-rs / wgpu

A cross-platform, safe, pure-Rust graphics API.
https://wgpu.rs
Apache License 2.0
12.3k stars 905 forks source link

SYNC-HAZARD-READ-AFTER-WRITE validation error (multithreaded, MoltenVK) #6344

Open simonask opened 3 days ago

simonask commented 3 days ago

Description Scenario:

  1. Multiple threads are submitting commands to the same device/queue pair (e.g. through test cases running with cargo test).
  2. Each thread has a complicated setup of buffers and textures with various combinations of buffer/texture usage flags.
  3. Each thread is submitting both render passes and compute passes.
  4. Occasionally, the Vulkan validation layers produce the error below, but it doesn't seem to be reproducible with --test-threads=1.
  5. The error also seems sensitive to the size of the involved buffers. For example, some of the involved vertex buffers occasionally have a size of 0.
  6. I'm using some unsafe APIs to create shader modules from raw precompiled SPIR-V, and to enable some Vulkan-specific shader extensions, but my understanding is that this validation error is on the host/driver side, and should not be something that an invalid shader can trigger. Could be wrong though? The SPIR-V is generated by slangc and should be correct and matching the environment (flavor profile glsl_460).

Theories:

  1. It feels likely that this is caused by a missing barrier in wgpu. Maybe related to #4732 (but doesn't explicitly involve an overlap in writable resources between different command encoders), or #5373 (but is a different sync hazard kind, so potentially a different barrier type).
  2. Maybe a buffer memory alignment error, or a buffer barrier "provenance" error with zero-sized buffers?

Validation error:

2024-09-30T11:06:49.048183Z ERROR wgpu_hal::vulkan::instance: VALIDATION [SYNC-HAZARD-READ-AFTER-WRITE (0xe4d96472)]
        Validation Error: [ SYNC-HAZARD-READ-AFTER-WRITE ] Object 0: handle = 0x111751c18, type = VK_OBJECT_TYPE_QUEUE; | MessageID = 0xe4d96472 | vkQueueSubmit():  Hazard READ_AFTER_WRITE for entry 3, VkCommandBuffer 0x13202d138[], Submitted access info (submitted_usage: SYNC_VERTEX_ATTRIBUTE_INPUT_VERTEX_ATTRIBUTE_READ, command: vkCmdDraw, seq_no: 2, reset_no: 1). Access info (prior_usage: SYNC_COPY_TRANSFER_WRITE, write_barriers: SYNC_FRAGMENT_SHADER_COLOR_ATTACHMENT_READ|SYNC_FRAGMENT_SHADER_DEPTH_STENCIL_ATTACHMENT_READ|SYNC_FRAGMENT_SHADER_INPUT_ATTACHMENT_READ|SYNC_EARLY_FRAGMENT_TESTS_DEPTH_STENCIL_ATTACHMENT_READ|SYNC_EARLY_FRAGMENT_TESTS_DEPTH_STENCIL_ATTACHMENT_WRITE|SYNC_LATE_FRAGMENT_TESTS_DEPTH_STENCIL_ATTACHMENT_READ|SYNC_LATE_FRAGMENT_TESTS_DEPTH_STENCIL_ATTACHMENT_WRITE|SYNC_COLOR_ATTACHMENT_OUTPUT_COLOR_ATTACHMENT_READ|SYNC_COLOR_ATTACHMENT_OUTPUT_COLOR_ATTACHMENT_WRITE|SYNC_SUBPASS_SHADER_HUAWEI_INPUT_ATTACHMENT_READ, queue: VkQueue 0x111751c18[], submit: 0, batch: 0, batch_tag: 1, command: vkCmdCopyBuffer, command_buffer: VkCommandBuffer 0x111641e78[(wgpu internal) PendingWrites], seq_no: 7, reset_no: 1).    
2024-09-30T11:06:49.048666Z ERROR wgpu_hal::vulkan::instance:   objects: (type: QUEUE, hndl: 0x111751c18, name: ?)  

Repro steps Unfortunately I have found it extremely difficult to reproduce outside of my rather complicated code base. My approach has been to take the RUST_LOG=trace output and writing code that produces the same log output using raw wgpu APIs, but no luck. But my attempt to reproduce was also incomplete, in that I did not create all of the same resources (shader modules, pipeline layouts, pipelines, etc.).

Expected vs observed behavior I see sync validation errors from Vulkan, and I expected wgpu to automatically insert all sync barriers. :-)

Extra materials Since the bug is only (seemingly) apparent with multithreaded use, it's extremely difficult to get a useful trace output. Let me know if that would be helpful, though.

Platform MacBook Pro M1 (macOS 15.0 Sequoia), latest MoltenVK on Vulkan apiVersion 1.2.283, custom rendering engine built on top of wgpu.

simonask commented 3 days ago

Alright, I'm actually able to reproduce this on Windows 11 as well (NVIDIA, latest Vulkan SDK).