Open JMS55 opened 9 months ago
Additionally, each
batch_and_prepare_render_phase
cannot run in parallel with each other due toResMut<GpuArrayBuffer>
.
This is potentially avoidable by splitting the resource based on phase, though I'm not sure how that would play out on the GPU side. Though that doesn't address the issue of the duplicate copies.
When trying to accelerate encase
encoding, I found that the memory copy and encoding costs aren't the bottleneck in the batching systems. It may come down to the fact that we're computing the matrix inverses, potentially redundantly, across multiple render phases.
The other bottleneck is that we're uploading lots of redundant transforms to the GPU, which is a lot of bandwidth usage.
Since https://github.com/bevyengine/bevy/pull/9685, each instance of
batch_and_prepare_render_phase
will append a copy ofMeshUniform
to theGpuArrayBuffer<MeshUniform>
resource for each entity x phase. For 2 cameras, each with 3 phases (shadow, prepass, main pass), that's 6 total copies per entity... Additionally, eachbatch_and_prepare_render_phase
cannot run in parallel with each other due toResMut<GpuArrayBuffer>
.This was kind of intended as it makes it easy for each draw to find the correct MeshUniform in the shader, but it does have the above downsides.