gfx-rs / wgpu

A cross-platform, safe, pure-Rust graphics API.
https://wgpu.rs
Apache License 2.0
12.72k stars 933 forks source link

Regression: wgpu 23 adds extra dispatch on Vulkan for indirect buffer copy, breaks binding reuse #6567

Open djeedai opened 3 days ago

djeedai commented 3 days ago

Description

On Vulkan (win11) with wgpu 23, every single of my indirect dispatch is now preceded by a vkCmdDispatch that I never dispatched myself. The dispatch seems to copy the indirect args from my buffer into a wgpu-owned buffer. The dispatch occurs just-in-time, after all bindings have been set for my own indirect dispatch, which means I get:

  1. set_pipeline (my pipeline)
  2. bunch of set_bind_group (my bind groups)
  3. set_pipeline (wgpu)
  4. bunch of set_bind_group (wgpu)
  5. vkCmdDispatch
  6. same as 1.
  7. same as 2.
  8. vkCmdIndirectDispatch

Step 3 to 5 didn't exist before, and not only look useless (why does wgpu copy my indirect params, I already took care of maintaining a buffer for them, it's not to have that work ignored) but also adds a lot of unnecessary state changes (steps 1. and 2., overwritten by 3. and 4.) which are slow.

Repro steps

git clone https://github.com/djeedai/bevy_hanabi.git -b u/bevy15 cargo r --example firework --no-default-features --features="bevy/bevy_winit bevy/bevy_pbr 3d "

Expected vs observed behavior

Same as previously, no step 3. and 4. above, which do not correspond to any API call made from Rust.

Extra materials

RenderDoc capture: wgpu-extra-dispatch-vulkan-win11.zip

The follow results from a single indirect dispatch (I never called dispatch_workgroup() (non-indirect)): image

Pipeline state of the injected (non-indirect) dispatch, showing it copies from my buffer to a wgpu one: image

Platform

wgpu 23, win11, bevy 0.15-rc.3

ErichDonGubler commented 3 days ago

@teoxoy: Isn't this a dupe of our intent to allow configuration to avoid the injected indirect dispatch pass? I'm not sure what issue that is, though.

cwfitzgerald commented 3 days ago

Yeah we want to have this optional somehow.

Step 3 to 5 didn't exist before, and not only look useless (why does wgpu copy my indirect params, I already took care of maintaining a buffer for them, it's not to have that work ignored).

For this part, we are doing it to validate that your indirect args are within the limits of the adapter. This is an unfortunate requirement for running untrusted programs on the web, which is why we want a way to disable it on native.

The unnecessary state changes are a bug in the ordering of the validation code.