bevyengine / bevy

A refreshingly simple data-driven game engine built in Rust
https://bevyengine.org
Apache License 2.0
33.99k stars 3.31k forks source link

Compute World: a new world for GPGPU #8440

Open Kjolnyr opened 1 year ago

Kjolnyr commented 1 year ago

Rethinking the way we handle Compute Shaders and GPGPU

I understand that this is a very controversal topic. I think it might be a good idea to have an issue for us to discuss it and its design.

The problem to solve

The use of compute shaders for massively parallel GPGPU (General-purpose computing on graphics processing units) algorithms or to speed up parts of rendering is more and more requested by game developers or graphics programmers in general.

Bevy's current API for compute shader is very much rendering centered and that makes it quite hard and boilerplaty to work with. In addition, it's also very hard to get data back from the GPU once your compute shader have finished running.

What solution would you like?

I think there's quite a few way to improve the API for compute shaders some of them are the following:

1. Having a new World, namely GPGPU World of Compute World, running aside from App World and Render World, and being in charge of GPGPU algorithms. This would allow us to keep having a clear boundary between Render World and the two others. To show this, we can find of going from this:

Current Worlds

to this:

New Worlds

The Inject phase show above is mainly to emphasize on the fact that data is flowing from Compute World to App World. In practice, I think we can integrate this logic to Extract.

App World in frame N would have read access to data from Compute World N - 1 and can write for Compute World N + 1.

Data Flow

2. Being able to handle compute shader's logic from App World. (like I did in bevy_app_compute) The main issue I see with this approach is that it's really hard to know where in the underlying WGPU queue we are injecting new submit() calls.

3. Keep the current Compute Shader implementation, but improve the API by using derive macros like AsBindGroup for example, and functions to handle it from App World. The main issue with this method, is to get data back from the GPU as we can only talk to Render World in the Extract phase.

for the last two options, we could imagine defining our Compute structs like that:

/// Extending the current `AsBindGroup` macro to support COPY_SRC + COPY_DST buffers and staging buffers
#[derive(AsBindGroup)]
struct MyCompute {
    #[uniform(0)]
    foo: f32,

    /// we have CPU read access to bar
    #[storage(1), staging]
    bar: Vec<Position>,

    /// We can copy from rw to other storage buffers
    #[storage(2), copy_src]
    rw: Velocity
}

/// Having a helper trait working like `Material`
impl Compute for MyCompute {
    fn compute_shader() -> ShaderRef {
        /// ...
    }
    /// Some more functions to configure our shader
}

Maybe we can also exploit wgpu's reflection system in some way to be more flexible and dynamic about our compute shader logic since we don't have to make a layout necessarily, but I think that's to be discussed in another issue.

I'm leaning toward option no. 1.

Known limitation

The current implementation of wgpu only has one interal queue for submit() calls. Therefore, we still have to know where and when to put our compute submit() calls, as they will run sequentially with the render queue. I know that vulkan has a separate compute queue, and I'm hoping that one day, wgpu will too. I just don't know how feasible / mature this multi-queue system is for other backends wgpu implements.

What alternative(s) have you considered?

As written before, option 2. and 3. are the current alternatives.

Kjolnyr commented 1 year ago

I think this might be related to #7893 Others relevant issues / discussions: #5024 #3904 #4796 Also, I think I should have open a Discussion for this topic, not an actual Issue :)

slyedoc commented 4 months ago

Is there any updates on this?

When to procedurally generate textures, and when to use a texture for mesh and collider. That was 2 weeks ago, since I have been working on workarounds. bevy_app_compute is by far the closest solution I have seen, but the lack of texture support is not ideal. I am currently porting an AsBindGroup "Compute" and recreating things like RenderAssets in app world, still wip.

slyedoc commented 3 months ago

Ending up writing https://github.com/slyedoc/bevy_sly_compute for this.