bevyengine / bevy

A refreshingly simple data-driven game engine built in Rust
https://bevyengine.org
Apache License 2.0
35.8k stars 3.54k forks source link

Spawning or modifying many different 2d or 3d materials hangs for minutes or crashes #15893

Open DGriffin91 opened 5 days ago

DGriffin91 commented 5 days ago

Bevy version 89e19aaff0082a9d353d3130f3d79e85ebb122c0

The many_cubes example with cargo run --example many_cubes --release -- --vary-material-data-per-instance hangs indefinitely (Update: Tried just letting this run and after a little over 2 minutes the example started working).

This regression also affects modifying materials at run time. See example: https://github.com/bevyengine/bevy/issues/15893#issuecomment-2421224174

Windows 10 / RTX3060 / Vulkan

The issue was introduced at https://github.com/bevyengine/bevy/commit/7b81ae7e406e61b108accb30352d67304ff6f044 with Update WGPU to version 22

Apple M1 / Metal: Hangs for 4 minutes Win10 / GTX1060 / Vulkan / i7 6700k: Hangs for 12 minutes Win10 / RTX3060 / Vulkan / 7950x: Hangs for 2 minutes Win10 / RTX3060 / Dx12 / 7950x: Crashes (Note Dx12 also crashes in 0.14)

2024-10-14T19:44:43.028155Z ERROR wgpu_hal::dx12::descriptor: Unable to allocate descriptors: RangeAllocationError { fragmented_free_length: 1 }
2024-10-14T19:44:43.028302Z ERROR wgpu::backend::wgpu_core: Handling wgpu errors as fatal by default
thread 'main' panicked at \.cargo\registry\src\index.crates.io-6f17d22bba15001f\wgpu-0.20.1\src\backend\wgpu_core.rs:2996:5:
wgpu error: Validation Error
Caused by:
    In Device::create_bind_group
      note: label = `StandardMaterial`
    Not enough memory left.

Minimal-ish 3d example:

use bevy::{diagnostic::*, prelude::*};
fn main() {
    App::new()
        .add_plugins((
            DefaultPlugins,
            FrameTimeDiagnosticsPlugin,
            LogDiagnosticsPlugin::default(),
        ))
        .add_systems(Startup, setup)
        .run();
}
fn setup(
    mut commands: Commands,
    mut meshes: ResMut<Assets<Mesh>>,
    mut materials: ResMut<Assets<StandardMaterial>>,
) {
    let mesh = Mesh3d(meshes.add(Cuboid::new(1.0, 1.0, 1.0)));
    for i in 0..50000 {
        commands.spawn((
            mesh.clone(),
            MeshMaterial3d(materials.add(Color::WHITE)),
            Transform::from_xyz(4.0, 0.0, -i as f32 * 2.0),
        ));
    }
    commands.spawn(Camera3d::default());
}

Minimal-ish 2d example:

use bevy::{diagnostic::*, prelude::*};
fn main() {
    App::new()
        .add_plugins((
            DefaultPlugins,
            FrameTimeDiagnosticsPlugin,
            LogDiagnosticsPlugin::default(),
        ))
        .add_systems(Startup, setup)
        .run();
}
fn setup(
    mut commands: Commands,
    mut meshes: ResMut<Assets<Mesh>>,
    mut materials: ResMut<Assets<ColorMaterial>>,
) {
    let mesh = Mesh2d(meshes.add(Rectangle::default()));
    for i in 0..200000 {
        commands.spawn((
            mesh.clone(),
            MeshMaterial2d(materials.add(Color::WHITE)),
            Transform::from_xyz(i as f32, 0.0, 0.0),
        ));
    }
    commands.spawn(Camera2d);
}

Here's vtune filtered in on just the portion of time where it's hanging on the minimal 3d example: Image

https://github.com/gfx-rs/wgpu/blob/c746c90ac0f34e19d975668e022b5e8c367201c3/wgpu-core/src/device/resource.rs#L2299 Image

vtune tested using release with debug symbols: --profile release-with-debug

[profile.release-with-debug]
inherits = "release"
debug = true
teoxoy commented 2 days ago

That call to retain is not great as it will move all elements if the bind groups have been dropped in the same order that they have been created in. Since v22 our ownership model is closer to what it should be; I think we were previously dropping bind groups later in bulk.

Edit: The cause of this is https://github.com/gfx-rs/wgpu/pull/5874 which fixed a leak so we are now behaving properly but should find a way to minimize the scanning of those weak refs (opened: https://github.com/gfx-rs/wgpu/pull/6419).

tychedelia commented 1 day ago

This issue is that we're using the same texture for every material, which means the retain loop is effectively exponential as we continue to try to retain 1..N materials that all share the same texture handle. While the behavior is obviously not great, I think this is a somewhat edge case we're hitting because of our stress test and is unlikely to affect users as long as they don't try to spawn huge numbers of materials with shared resource bindings in a similar manner.

DGriffin91 commented 1 day ago

I think it's very common to share resources like textures across a significant number of different materials. For example, I've seen lots of actual games in production use the same grunge texture across a ton of different materials using mixing different channels of that same texture at different scales, tinting, blending, etc...

tychedelia commented 1 day ago

I think it's very common to share resources like textures across a significant number of different materials

Totally! Testing on my mbp I can spawn ~20k materials using --material-texture-count before I start get a beachball. Correct me if I'm wrong relative to production use, but that still seems like a ton of unique materials to all share the same texture. Definitely still a major performance regression.

DGriffin91 commented 1 day ago

Totally! Testing on my mbp I can spawn ~20k materials using --material-texture-count before I start get a beachball. Correct me if I'm wrong relative to production use, but that still seems like a ton of unique materials to all share the same texture.

I don't think 20k materials sharing the same texture is at all out of the question. That texture might be something related to the environment, a LUT of some kind, or something else that is widely shared etc... The many cubes example spawns 160k cubes with varying materials. Loading an actual large scene with a count like that would take 12 minutes on the Core i7 6700k / GTX1060 system for just this portion. If this regression was half the performance of the previous version of wgpu that would be one thing. But it appears to around 500x slower than it was in bevy 0.14 on the Core i7 6700k.

This also affects updating materials. The example below runs at 43ms/frame on bevy 0.14 with the 7950x and 3060. This is already very slow (idk if the performance issue with it in 0.14 is because of bevy, wgpu or both). In 0.15 it runs at 950ms/frame.

use bevy::{diagnostic::*, prelude::*};
fn main() {
    App::new()
        .add_plugins((
            DefaultPlugins,
            FrameTimeDiagnosticsPlugin,
            LogDiagnosticsPlugin::default(),
        ))
        .add_systems(Startup, setup)
        .add_systems(Update, update_materials)
        .run();
}
fn setup(
    mut commands: Commands,
    mut meshes: ResMut<Assets<Mesh>>,
    mut materials: ResMut<Assets<StandardMaterial>>,
) {
    let mesh = Mesh3d(meshes.add(Cuboid::new(1.0, 1.0, 1.0)));
    for i in 0..5000 {
        commands.spawn((
            mesh.clone(),
            MeshMaterial3d(materials.add(StandardMaterial {
                base_color: Color::linear_rgb(1.0, 0.0, 0.0),
                unlit: true,
                ..default()
            })),
            Transform::from_xyz(4.0, 0.0, -i as f32 * 2.0),
        ));
    }
    commands.spawn(Camera3d::default());
}
fn update_materials(mut materials: ResMut<Assets<StandardMaterial>>, time: Res<Time>) {
    for (i, (_, m)) in materials.iter_mut().enumerate() {
        m.base_color = Color::hsv(
            (time.elapsed_secs() * 100.0 + i as f32).rem_euclid(360.0),
            1.0,
            1.0,
        );
    }
}
DGriffin91 commented 1 day ago

@tychedelia One ubiquitous example of a shared resource would, at least in bevy, be the placeholder texture. That might be what makes these minimal examples so slow if your guess is correct about the issue being related to sharing the same texture.

tychedelia commented 1 day ago

This also affects updating materials.

Okay, this actually feels like a much bigger deal since it's not possible to hide behind loading. You've fully convinced me! Thanks.

That might be what makes these minimal examples so slow if your guess is correct about the issue being related to sharing the same texture.

Parking on a breakpoint

Image