bevyengine / bevy

A refreshingly simple data-driven game engine built in Rust
https://bevyengine.org
Apache License 2.0
32.73k stars 3.18k forks source link

Android example very slow running on device following light transmission changes #10338

Open fudini opened 6 months ago

fudini commented 6 months ago

The commit 44928e0df49a202c201a6962775e6883cafebb7e makes mobile example run on android device at ~1 FPS (~20 before).

To reproduce:

cargo apk run -p bevy_mobile_example
alice-i-cecile commented 6 months ago

@coreh @superdump any ideas?

cart commented 6 months ago

Hmm my money is on the shader code, given that we don't run the transmission pass without transmissive materials. Maybe try commenting out the relevant branches/code blocks in pbr_functions.wgsl?

ex: if diffuse_transmission > 0.0 {

fudini commented 6 months ago

Commenting out suggested by @cart shader code makes it run faster but not as fast as before. I measured with diagnostic plugins and got this: Before: 14 FPS After: 2 FPS Without shader code: 10 FPS

cart commented 6 months ago

but not as fast as before.

For clarity: did you test the commit right before the Transmission commit?

fudini commented 6 months ago

but not as fast as before.

For clarity: did you test the commit right before the Transmission commit?

Yes, d67fbd5e9

cart commented 6 months ago

So one quick (partial) fix would be to put all of that code behind a "shader def", which we set whenever diffuse transmission is above 0.0.

mtsr commented 6 months ago

It might be worth looking at a renderdoc capture of pre and post merge. Most significant changes (gpu-side) should be easy to spot there.

cart commented 6 months ago

I'm not able to reproduce a significant drop on my Pixel 6.

Average FPS across 10-ish seconds each

https://github.com/bevyengine/bevy/commit/d67fbd5e90a1eb307dc5493abd475ac172698e2e: 24.85 main (no changes to transmission code): 25.59 main (transmission branches commented out): 25.77

@fudini what phone are you testing on?

fudini commented 6 months ago

@fudini what phone are you testing on?

Huawei P30 Lite

superdump commented 6 months ago

According to GSMArena:

OS Android 9.0 (Pie), upgradable to Android 10, EMUI 10.0
Chipset Kirin 710 (12 nm)
CPU Octa-core (4x2.2 GHz Cortex-A73 & 4x1.7 GHz Cortex-A53)
GPU Mali-G51 MP4

OS Android 9.0 (Pie), upgradable to Android 10, EMUI 10.0 Chipset Kirin 710 (12 nm) CPU Octa-core (4x2.2 GHz Cortex-A73 & 4x1.7 GHz Cortex-A53) GPU Mali-G51 MP4

cart commented 6 months ago

I'm bumping this to 0.13. While this regression isn't ideal, it clearly doesn't apply to every device. We can iterate here (and consider doing a patch release once we identify the right fix)

mockersf commented 6 months ago

I don't have an android device available for now, but I can't reproduce on the emulator

ramirezmike commented 6 months ago

I've been testing compiling wasm and android on my phone, a OnePlus 7T Pro, and suspect I am running into this, however I noticed it performs better if I zoom out which, if related, may be helpful information?

I took the mobile example, removed everything but the cube and the camera (no lights) and compiled it to android but also made a wasm build loaded in a cordova app and put that on my phone too.

The android build hovers around 50fps, hits 60fps if I zoom out and drops to 40fps if I zoom in. The wasm build hovers in the single digits and hits 90fps if I zoom out.

Setting the camera transform to something like Transform::from_xyz(-22.0, 2.5, 25.0) gave me good performance, but even just a few units back was enough to notice a difference. I'll try out d67fbd5 with it over the weekend when I get a chance to see if it makes a difference.

ramirezmike commented 6 months ago

d67fbd5 didn't fix the "zoom issue" I noticed, although it did perform slightly better.

Here are some shots with the camera positioned further back and then closer to the cube. Top is d67fbd5 and Bottom is main.

mobile_zoom

Should I make a separate issue for this?

irate-devil commented 6 months ago

Zooming in makes the material cover more pixels, so the degraded performance confirms that the fragment shader is the bottleneck.

MalekiRe commented 6 months ago

I've noticed an extreme performance regression on quest 2. Not 100% sure if it's related but it did not occur prior to the 0.12 release at least a few weeks before hand, and now FPS with a very very minimal example is very bad.

MalekiRe commented 4 months ago

I can confirm now having a different shader on an objects completely fixes the performance issue

dror-g commented 4 months ago

Just to note that the issue affects PC as well. Less noticeable as they can handle the load, but with VSync off I see a drop in FPS from +300 to ~100 on laptops with Intel HD graphics and Nvidia discrete. see #11213 . Thanks!

ramirezmike commented 4 months ago

Just to note that the issue affects PC as well. Less noticeable as they can handle the load, but with VSync off I see a drop in FPS from +300 to ~100 on laptops with Intel HD graphics and Nvidia discrete. see #11213 . Thanks!

I noticed this too on my amd laptop. I can hit 120 frames, but if I get close to a cube so that it fills the screen, the frame rate drops 10-20 frames. Can't really tell if the refresh rate is locked at 60.

dror-g commented 3 months ago

I can confirm now having a different shader on an objects completely fixes the performance issue

@MalekiRe , can you share an example of said shader? Thanks!

To be precise - which shader is the offending one?
If I were to override the fragment shader, I'd like to preserve as much of the functionality of the original shader as possible.

dror-g commented 3 months ago

Ok, so I tried to isolate the issue while preserving original PBR/material features.
I found that the issue exists when using ExtendedMaterial as well (with StandardMaterial base).

I created a "blank" fragment shader (looking at extended_material.wsgl ),

Applied that to my GLTF models as extended material (great guide & code on how to patch SceneBundle here, many thanks @nicopap!!!!)

@fragment
fn fragment(
    in: VertexOutput,
    @builtin(front_facing) is_front: bool,
) -> FragmentOutput {
    var pbr_input = pbr_input_from_standard_material(in, is_front);
    out.color = pbr_input.material.base_color;
    var out: FragmentOutput;

    // Slow. from original file.
    //out.color = apply_pbr_lighting(pbr_input);

    // Fast
    out.color = pbr_input.material.base_color;
    return out;
}

With this frag shader the issue is non-existent on Android. High fps with many entities / zoomed in.

It was clear that the offending function was apply_pbr_lighting(pbr_input). Hard to believe, right? :stuck_out_tongue_winking_eye:

So I went about trying to isolate the problem within this function.
Copied apply_pbr_lighting() from pbr_functions.wgsl to my extended mat shader.

It applies all light sources, concating them here: https://github.com/bevyengine/bevy/blob/22e39c4abf6e2fdf99ba0820b3c35db73be71347/crates/bevy_pbr/src/render/pbr_functions.wgsl#L375-L379 I attempted to use only one light source at a time.
Sadly, could not find a single culprit source or function that causes the massive fps drop.
I can only say that direct_light has the biggest effect on fps (another shocker I'm sure :rofl: ),
But even it caused only a 30% drop in fps.
All combined and enabled the issue exists and fps tanks.

For now, I stripped the apply_pbr_lighting func to an absolute minimum - no shadows (cause a crash on Android anyway...), no ambient, no point lights, no spot lights - only directional light.
That works for my needs at the moment with good fps.

Sorry I couldn't point out the root cause of the zoomed-in fps drop.. really tried removing line by line from pbr_functions but that didn't give anything. Just many calculations together lead to drop. Sorry.

Anyway, if anyone needs a simple working shader, here's my stripped down custom_shader.wgsl:

#import bevy_pbr::{
    pbr_fragment::pbr_input_from_standard_material,
    forward_io::{VertexOutput, FragmentOutput},
    pbr_types,
    pbr_bindings,
    mesh_view_bindings as view_bindings,
    lighting,
    utils::E,
}

fn apply_pbr_lighting(
    in: pbr_types::PbrInput,
) -> vec4<f32> {
    var output_color: vec4<f32> = in.material.base_color;

    // calculate non-linear roughness from linear perceptualRoughness
    let metallic = in.material.metallic;
    let perceptual_roughness = in.material.perceptual_roughness;
    let roughness = lighting::perceptualRoughnessToRoughness(perceptual_roughness);
    let ior = in.material.ior;
    let thickness = in.material.thickness;
    let diffuse_transmission = in.material.diffuse_transmission;
    let specular_transmission = in.material.specular_transmission;

    // Neubelt and Pettineo 2013, "Crafting a Next-gen Material Pipeline for The Order: 1886"
    let NdotV = max(dot(in.N, in.V), 0.0001);

    // Remapping [0,1] reflectance to F0
    // See https://google.github.io/filament/Filament.html#materialsystem/parameterization/remapping
    let reflectance = in.material.reflectance;
    let F0 = 0.16 * reflectance * reflectance * (1.0 - metallic) + output_color.rgb * metallic;

    // Diffuse strength is inversely related to metallicity, specular and diffuse transmission
    let diffuse_color = output_color.rgb * (1.0 - metallic) * (1.0 - specular_transmission) * (1.0 - diffuse_transmission);

    let R = reflect(-in.V, in.N);

    let f_ab = lighting::F_AB(perceptual_roughness, NdotV);

    var direct_light: vec3<f32> = vec3<f32>(0.0);

    // Transmitted Light (Specular and Diffuse)
    var transmitted_light: vec3<f32> = vec3<f32>(0.0);

    // directional lights (direct)
    let n_directional_lights = view_bindings::lights.n_directional_lights;
    for (var i: u32 = 0u; i < n_directional_lights; i = i + 1u) {
        var light_contrib = lighting::directional_light(i, roughness, NdotV, in.N, in.V, R, F0, f_ab, diffuse_color);

        direct_light += light_contrib;
    }

    // Total light
    output_color = vec4<f32>(
        //transmitted_light + direct_light + indirect_light + emissive_light, // original
        direct_light,
        output_color.a
    );

    return output_color;
}

@fragment
fn fragment(
    in: VertexOutput,
    @builtin(front_facing) is_front: bool,
) -> FragmentOutput {
    // generate a PbrInput struct from the StandardMaterial bindings
    var pbr_input = pbr_input_from_standard_material(in, is_front);

    var out: FragmentOutput;
    // apply lighting
    out.color = apply_pbr_lighting(pbr_input);

    return out;
}
ramirezmike commented 3 months ago

I wanted to try this out after #11627 was merged but I happened to break the phone I was using :( This issue unfortunately doesn't happen on my replacement, a pixel. I checked out the commit before and after #11627 and it behaved identically, roughly staying at 49-60FPS regardless of how close or far I was to objects.

JMS55 commented 3 months ago

This issue should be fixed. Can anyone who ran into the original issue reproduce it on 0.13?

blaind commented 2 weeks ago

There was a lengthy debug and discussion in Bevy #xr channel.

At least in Oculus Quest, the performance with pointlight & shadows is still remarkably low (~60 fps with a simple cube-scene)

Performance improves a lot (120fps) when either

a) disabling shadows from the point light

commands.spawn(PointLightBundle {
    point_light: PointLight {
        intensity: 1500.0,
        shadows_enabled: false,
        ..default()
    },
    transform: Transform::from_xyz(4.0, 8.0, 4.0),
    ..default()
});

or

b) using directional light

commands.spawn(DirectionalLightBundle {
    transform: Transform::from_rotation(Quat::from_rotation_z(PI / 2.0)),
    ..default()
});

Might related to GPU memory write bandwidth (4.4TB writes/s with 120fps if shadows disabled in contrast to 6.8TB writes/s and 60fps when shadows enabled)