bevyengine / bevy

A refreshingly simple data-driven game engine built in Rust
https://bevyengine.org
Apache License 2.0
36.68k stars 3.61k forks source link

Redundant copies of MeshUniform #11770

Open JMS55 opened 9 months ago

JMS55 commented 9 months ago

Since https://github.com/bevyengine/bevy/pull/9685, each instance of batch_and_prepare_render_phase will append a copy of MeshUniform to the GpuArrayBuffer<MeshUniform> resource for each entity x phase. For 2 cameras, each with 3 phases (shadow, prepass, main pass), that's 6 total copies per entity... Additionally, each batch_and_prepare_render_phase cannot run in parallel with each other due to ResMut<GpuArrayBuffer>.

This was kind of intended as it makes it easy for each draw to find the correct MeshUniform in the shader, but it does have the above downsides.

james7132 commented 9 months ago

Additionally, each batch_and_prepare_render_phase cannot run in parallel with each other due to ResMut<GpuArrayBuffer>.

This is potentially avoidable by splitting the resource based on phase, though I'm not sure how that would play out on the GPU side. Though that doesn't address the issue of the duplicate copies.

james7132 commented 9 months ago

When trying to accelerate encase encoding, I found that the memory copy and encoding costs aren't the bottleneck in the batching systems. It may come down to the fact that we're computing the matrix inverses, potentially redundantly, across multiple render phases.

JMS55 commented 1 month ago

The other bottleneck is that we're uploading lots of redundant transforms to the GPU, which is a lot of bandwidth usage.