bevyengine / bevy

A refreshingly simple data-driven game engine built in Rust
https://bevyengine.org
Apache License 2.0
35.44k stars 3.5k forks source link

Raspberry Pi 4 Performance Regression #14253

Open s-mayrh opened 2 months ago

s-mayrh commented 2 months ago

Based on what I previously wrote in Q&A discussion https://github.com/bevyengine/bevy/discussions/9667 I now file this issue because the last update made performance significantly worse. For reference there is also Q&A Discussion https://github.com/bevyengine/bevy/discussions/3821 about bad performance on old versions of bevy on Raspberry Pi 4.

Bevy version

Relevant system information

Raspberry Pi 4 Model B Rev 1.4 CPU: 4 × ARM Cortex-A72 @ 2.0 GHz GPU: Broadcom VideoCore VI (VC6/V3D 4.2.14) @ 600 MHz RAM: 8 GiB Video memory: 7807 MB, Unified memory: yes (from glxinfo)

cargo 1.79.0 (ffa9cf99a 2024-06-03) rustc 1.79.0 (129f3b996 2024-06-10) stable-aarch64-unknown-linux-gnu OS: Manjaro ARM

All package versions are the latest from my distribution:

kernel-release: 6.6.33-2-MANJARO-RPI4 mesa 24.0.2-1 mesa-utils 9.0.0-3 vulkan-broadcom 24.0.2-1 vulkan-mesa-layers 24.0.2-1 vulkan-headers 1:1.3.276-1 vulkan-icd-loader 1:1.3.276-1 vulkan-tools 1:1.3.269-1

attached: vulkaninfo.txt

What's performing poorly?

General performance until 0.13.2 was less than optimal and glitchy, but at least playable in a simple 2D context with sprites. In bevy 0.14 performance is worse: lots of stutters and glitches like in the screenshot below. It's noticeably better on Wayland (wayland feature flag enabled): frame rate is higher, it stutters less and the visual glitches are less pronounced, but it's still not pleasant. Only the combination Wayland with bevy 0.13.2 in release mode gives acceptable results.

grafik

Before and After Traces

All combinations of debug(dev)/release – 0.13.2/0.14 – X11/Wayland: traces-debug.zip traces-release.zip

Additional information

Though bevy says

INFO bevy_render::renderer: AdapterInfo { name: "V3D 4.2.14", vendor: 5348, device: 3192414163, device_type: IntegratedGpu, driver: "V3DV Mesa", driver_info: "Mesa 24.0.2-arch.1", backend: Vulkan }

it seems rendering only utilizes the CPU because the process consumes more than one 2 GHz core. I don't think any (previous or current) version of bevy in either X11 or Wayland mode was even able to utilize the GPU on the Pi 4. The VideoCore VI GPU should be moving 2D sprites around with ease… It plays SuperTuxKart in 720p quite well and Vulkan benchmarks are OK (motion is fluid on Wayland, acceptable on X11):

vkmark results on X11:

=======================================================
    vkmark 2017.08
=======================================================
    Vendor ID:      0x14E4
    Device ID:      0xBE485FD3
    Device Name:    V3D 4.2.14
    Driver Version: 100663298
    Device UUID:    f5fbade5afcf4857249c60b38bba43da
=======================================================
[vertex] device-local=true: FPS: 869 FrameTime: 1.151 ms
[vertex] device-local=false: FPS: 882 FrameTime: 1.134 ms
[texture] anisotropy=0: FPS: 746 FrameTime: 1.340 ms
[texture] anisotropy=16: FPS: 714 FrameTime: 1.401 ms
[shading] shading=gouraud: FPS: 698 FrameTime: 1.433 ms
[shading] shading=blinn-phong-inf: FPS: 586 FrameTime: 1.706 ms
[shading] shading=phong: FPS: 493 FrameTime: 2.028 ms
[shading] shading=cel: FPS: 482 FrameTime: 2.075 ms
[effect2d] kernel=edge: FPS: 293 FrameTime: 3.413 ms
[effect2d] kernel=blur: FPS: 147 FrameTime: 6.803 ms
[desktop] <default>: FPS: 327 FrameTime: 3.058 ms
[cube] <default>: FPS: 1057 FrameTime: 0.946 ms
[clear] <default>: FPS: 1126 FrameTime: 0.888 ms
=======================================================
                                   vkmark Score: 647
=======================================================

vkmark results on Wayland:

=======================================================
    vkmark 2017.08
=======================================================
    Vendor ID:      0x14E4
    Device ID:      0xBE485FD3
    Device Name:    V3D 4.2.14
    Driver Version: 100663298
    Device UUID:    f5fbade5afcf4857249c60b38bba43da
=======================================================
[vertex] device-local=true: FPS: 934 FrameTime: 1.071 ms
[vertex] device-local=false: FPS: 935 FrameTime: 1.070 ms
[texture] anisotropy=0: FPS: 776 FrameTime: 1.289 ms
[texture] anisotropy=16: FPS: 736 FrameTime: 1.359 ms
[shading] shading=gouraud: FPS: 732 FrameTime: 1.366 ms
[shading] shading=blinn-phong-inf: FPS: 602 FrameTime: 1.661 ms
[shading] shading=phong: FPS: 490 FrameTime: 2.041 ms
[shading] shading=cel: FPS: 486 FrameTime: 2.058 ms
[effect2d] kernel=edge: FPS: 301 FrameTime: 3.322 ms
[effect2d] kernel=blur: FPS: 194 FrameTime: 5.155 ms
[desktop] <default>: FPS: 325 FrameTime: 3.077 ms
[cube] <default>: FPS: 1177 FrameTime: 0.850 ms
[clear] <default>: FPS: 1406 FrameTime: 0.711 ms
=======================================================
                                   vkmark Score: 699
=======================================================

I think this is the part of my game that could have the most impact on performance – it's the camera centering on the player:

pub fn update_camera(
    gamepads: Res<Gamepads>,
    axes: Res<Axis<GamepadAxis>>,
    player_query: Query<&Transform, With<Player>>,
    mut camera_query: Query<&mut Transform, (With<Camera>, Without<Player>)>,
) {
    let mut look_around = Vec3::ZERO;

    //Right Analog Stick
    for gamepad in gamepads.iter() {
        look_around.x += 320.0 * axes
            .get(GamepadAxis::new(gamepad, GamepadAxisType::RightStickX))
            .unwrap();
        look_around.y += 180.0 * axes
            .get(GamepadAxis::new(gamepad, GamepadAxisType::RightStickY))
            .unwrap();
    }

    if let Ok(player_transform) = player_query.get_single() {
        if let Ok(mut camera_transform) = camera_query.get_single_mut() {
            camera_transform.translation.x = player_transform.translation.x;
            camera_transform.translation.y = player_transform.translation.y;
            camera_transform.translation += look_around;
        }
    }
}

Before I introduced looking around with the gamepad I had an Update filter on the player Transform query and had simply assigned the player Transform to the camera's Transform, but I haven't noticed any change in performance.

For comparision: My old Intel Atom abacus

CPU: 4 × x5-Z8350 @ 1,44 GHz GPU: Intel Cherry Trail @ 500 MHz RAM: 4 GiB Video RAM by dxdiag: total 2094 MB, VRAM 114 MB Display adapter RAM by msinfo32: 1 GiB Windows 10

doesn't sweat at all running this same game using Bevy 0.13.2 or 0.14 even in debug mode.

alice-i-cecile commented 2 months ago

14227 reported this more broadly; seems to be due to https://github.com/gfx-rs/wgpu/issues/5756

Wumpf commented 2 months ago

fyi: I just closed https://github.com/gfx-rs/wgpu/issues/5756 because for all we know so far it's debug assertions only which can be easily turned off for wgpu

s-mayrh commented 2 months ago

https://github.com/gfx-rs/wgpu/issues/5756#issuecomment-2233157360 is about their slowdown being caused entirely by debug assertions in wgpu. If a bevy project is built in release mode these should be disabled I guess and the release traces should not show much of a difference between bevy 0.13.2 and 0.14. I haven't dug into how to read these trace files yet, but based on similar execution time and the file sizes

trace-0.13.2-release-wayland.json (140,4 MiB) trace-0.14-release-wayland.json (69,3 MiB)

trace-0.13.2-release-x11.json (94,2 MiB) trace-0.14-release-x11.json (62,8 MiB)

… I am naively assuming the trace files for 0.13.2 contain 1½–2 times the frame cycles in comparision with 0.14. This could possibly point to a different issue. Can someone more specialised check this?