bevyengine / bevy

A refreshingly simple data-driven game engine built in Rust
https://bevyengine.org
Apache License 2.0
36.13k stars 3.56k forks source link

Android example very slow running on device following light transmission changes #10338

Open fudini opened 1 year ago

fudini commented 1 year ago

The commit 44928e0df49a202c201a6962775e6883cafebb7e makes mobile example run on android device at ~1 FPS (~20 before).

To reproduce:

cargo apk run -p bevy_mobile_example
alice-i-cecile commented 1 year ago

@coreh @superdump any ideas?

cart commented 1 year ago

Hmm my money is on the shader code, given that we don't run the transmission pass without transmissive materials. Maybe try commenting out the relevant branches/code blocks in pbr_functions.wgsl?

ex: if diffuse_transmission > 0.0 {

fudini commented 1 year ago

Commenting out suggested by @cart shader code makes it run faster but not as fast as before. I measured with diagnostic plugins and got this: Before: 14 FPS After: 2 FPS Without shader code: 10 FPS

cart commented 1 year ago

but not as fast as before.

For clarity: did you test the commit right before the Transmission commit?

fudini commented 1 year ago

but not as fast as before.

For clarity: did you test the commit right before the Transmission commit?

Yes, d67fbd5e9

cart commented 1 year ago

So one quick (partial) fix would be to put all of that code behind a "shader def", which we set whenever diffuse transmission is above 0.0.

mtsr commented 1 year ago

It might be worth looking at a renderdoc capture of pre and post merge. Most significant changes (gpu-side) should be easy to spot there.

cart commented 1 year ago

I'm not able to reproduce a significant drop on my Pixel 6.

Average FPS across 10-ish seconds each

https://github.com/bevyengine/bevy/commit/d67fbd5e90a1eb307dc5493abd475ac172698e2e: 24.85 main (no changes to transmission code): 25.59 main (transmission branches commented out): 25.77

@fudini what phone are you testing on?

fudini commented 1 year ago

@fudini what phone are you testing on?

Huawei P30 Lite

superdump commented 1 year ago

According to GSMArena:

OS Android 9.0 (Pie), upgradable to Android 10, EMUI 10.0
Chipset Kirin 710 (12 nm)
CPU Octa-core (4x2.2 GHz Cortex-A73 & 4x1.7 GHz Cortex-A53)
GPU Mali-G51 MP4

OS Android 9.0 (Pie), upgradable to Android 10, EMUI 10.0 Chipset Kirin 710 (12 nm) CPU Octa-core (4x2.2 GHz Cortex-A73 & 4x1.7 GHz Cortex-A53) GPU Mali-G51 MP4

cart commented 1 year ago

I'm bumping this to 0.13. While this regression isn't ideal, it clearly doesn't apply to every device. We can iterate here (and consider doing a patch release once we identify the right fix)

mockersf commented 1 year ago

I don't have an android device available for now, but I can't reproduce on the emulator

ramirezmike commented 12 months ago

I've been testing compiling wasm and android on my phone, a OnePlus 7T Pro, and suspect I am running into this, however I noticed it performs better if I zoom out which, if related, may be helpful information?

I took the mobile example, removed everything but the cube and the camera (no lights) and compiled it to android but also made a wasm build loaded in a cordova app and put that on my phone too.

The android build hovers around 50fps, hits 60fps if I zoom out and drops to 40fps if I zoom in. The wasm build hovers in the single digits and hits 90fps if I zoom out.

Setting the camera transform to something like Transform::from_xyz(-22.0, 2.5, 25.0) gave me good performance, but even just a few units back was enough to notice a difference. I'll try out d67fbd5 with it over the weekend when I get a chance to see if it makes a difference.

ramirezmike commented 12 months ago

d67fbd5 didn't fix the "zoom issue" I noticed, although it did perform slightly better.

Here are some shots with the camera positioned further back and then closer to the cube. Top is d67fbd5 and Bottom is main.

mobile_zoom

Should I make a separate issue for this?

tim-blackbird commented 12 months ago

Zooming in makes the material cover more pixels, so the degraded performance confirms that the fragment shader is the bottleneck.

MalekiRe commented 11 months ago

I've noticed an extreme performance regression on quest 2. Not 100% sure if it's related but it did not occur prior to the 0.12 release at least a few weeks before hand, and now FPS with a very very minimal example is very bad.

MalekiRe commented 10 months ago

I can confirm now having a different shader on an objects completely fixes the performance issue

dror-g commented 10 months ago

Just to note that the issue affects PC as well. Less noticeable as they can handle the load, but with VSync off I see a drop in FPS from +300 to ~100 on laptops with Intel HD graphics and Nvidia discrete. see #11213 . Thanks!

ramirezmike commented 10 months ago

Just to note that the issue affects PC as well. Less noticeable as they can handle the load, but with VSync off I see a drop in FPS from +300 to ~100 on laptops with Intel HD graphics and Nvidia discrete. see #11213 . Thanks!

I noticed this too on my amd laptop. I can hit 120 frames, but if I get close to a cube so that it fills the screen, the frame rate drops 10-20 frames. Can't really tell if the refresh rate is locked at 60.

dror-g commented 9 months ago

I can confirm now having a different shader on an objects completely fixes the performance issue

@MalekiRe , can you share an example of said shader? Thanks!

To be precise - which shader is the offending one?
If I were to override the fragment shader, I'd like to preserve as much of the functionality of the original shader as possible.

dror-g commented 9 months ago

Ok, so I tried to isolate the issue while preserving original PBR/material features.
I found that the issue exists when using ExtendedMaterial as well (with StandardMaterial base).

I created a "blank" fragment shader (looking at extended_material.wsgl ),

Applied that to my GLTF models as extended material (great guide & code on how to patch SceneBundle here, many thanks @nicopap!!!!)

@fragment
fn fragment(
    in: VertexOutput,
    @builtin(front_facing) is_front: bool,
) -> FragmentOutput {
    var pbr_input = pbr_input_from_standard_material(in, is_front);
    out.color = pbr_input.material.base_color;
    var out: FragmentOutput;

    // Slow. from original file.
    //out.color = apply_pbr_lighting(pbr_input);

    // Fast
    out.color = pbr_input.material.base_color;
    return out;
}

With this frag shader the issue is non-existent on Android. High fps with many entities / zoomed in.

It was clear that the offending function was apply_pbr_lighting(pbr_input). Hard to believe, right? :stuck_out_tongue_winking_eye:

So I went about trying to isolate the problem within this function.
Copied apply_pbr_lighting() from pbr_functions.wgsl to my extended mat shader.

It applies all light sources, concating them here: https://github.com/bevyengine/bevy/blob/22e39c4abf6e2fdf99ba0820b3c35db73be71347/crates/bevy_pbr/src/render/pbr_functions.wgsl#L375-L379 I attempted to use only one light source at a time.
Sadly, could not find a single culprit source or function that causes the massive fps drop.
I can only say that direct_light has the biggest effect on fps (another shocker I'm sure :rofl: ),
But even it caused only a 30% drop in fps.
All combined and enabled the issue exists and fps tanks.

For now, I stripped the apply_pbr_lighting func to an absolute minimum - no shadows (cause a crash on Android anyway...), no ambient, no point lights, no spot lights - only directional light.
That works for my needs at the moment with good fps.

Sorry I couldn't point out the root cause of the zoomed-in fps drop.. really tried removing line by line from pbr_functions but that didn't give anything. Just many calculations together lead to drop. Sorry.

Anyway, if anyone needs a simple working shader, here's my stripped down custom_shader.wgsl:

#import bevy_pbr::{
    pbr_fragment::pbr_input_from_standard_material,
    forward_io::{VertexOutput, FragmentOutput},
    pbr_types,
    pbr_bindings,
    mesh_view_bindings as view_bindings,
    lighting,
    utils::E,
}

fn apply_pbr_lighting(
    in: pbr_types::PbrInput,
) -> vec4<f32> {
    var output_color: vec4<f32> = in.material.base_color;

    // calculate non-linear roughness from linear perceptualRoughness
    let metallic = in.material.metallic;
    let perceptual_roughness = in.material.perceptual_roughness;
    let roughness = lighting::perceptualRoughnessToRoughness(perceptual_roughness);
    let ior = in.material.ior;
    let thickness = in.material.thickness;
    let diffuse_transmission = in.material.diffuse_transmission;
    let specular_transmission = in.material.specular_transmission;

    // Neubelt and Pettineo 2013, "Crafting a Next-gen Material Pipeline for The Order: 1886"
    let NdotV = max(dot(in.N, in.V), 0.0001);

    // Remapping [0,1] reflectance to F0
    // See https://google.github.io/filament/Filament.html#materialsystem/parameterization/remapping
    let reflectance = in.material.reflectance;
    let F0 = 0.16 * reflectance * reflectance * (1.0 - metallic) + output_color.rgb * metallic;

    // Diffuse strength is inversely related to metallicity, specular and diffuse transmission
    let diffuse_color = output_color.rgb * (1.0 - metallic) * (1.0 - specular_transmission) * (1.0 - diffuse_transmission);

    let R = reflect(-in.V, in.N);

    let f_ab = lighting::F_AB(perceptual_roughness, NdotV);

    var direct_light: vec3<f32> = vec3<f32>(0.0);

    // Transmitted Light (Specular and Diffuse)
    var transmitted_light: vec3<f32> = vec3<f32>(0.0);

    // directional lights (direct)
    let n_directional_lights = view_bindings::lights.n_directional_lights;
    for (var i: u32 = 0u; i < n_directional_lights; i = i + 1u) {
        var light_contrib = lighting::directional_light(i, roughness, NdotV, in.N, in.V, R, F0, f_ab, diffuse_color);

        direct_light += light_contrib;
    }

    // Total light
    output_color = vec4<f32>(
        //transmitted_light + direct_light + indirect_light + emissive_light, // original
        direct_light,
        output_color.a
    );

    return output_color;
}

@fragment
fn fragment(
    in: VertexOutput,
    @builtin(front_facing) is_front: bool,
) -> FragmentOutput {
    // generate a PbrInput struct from the StandardMaterial bindings
    var pbr_input = pbr_input_from_standard_material(in, is_front);

    var out: FragmentOutput;
    // apply lighting
    out.color = apply_pbr_lighting(pbr_input);

    return out;
}
ramirezmike commented 8 months ago

I wanted to try this out after #11627 was merged but I happened to break the phone I was using :( This issue unfortunately doesn't happen on my replacement, a pixel. I checked out the commit before and after #11627 and it behaved identically, roughly staying at 49-60FPS regardless of how close or far I was to objects.

JMS55 commented 8 months ago

This issue should be fixed. Can anyone who ran into the original issue reproduce it on 0.13?

blaind commented 6 months ago

There was a lengthy debug and discussion in Bevy #xr channel.

At least in Oculus Quest, the performance with pointlight & shadows is still remarkably low (~60 fps with a simple cube-scene)

Performance improves a lot (120fps) when either

a) disabling shadows from the point light

commands.spawn(PointLightBundle {
    point_light: PointLight {
        intensity: 1500.0,
        shadows_enabled: false,
        ..default()
    },
    transform: Transform::from_xyz(4.0, 8.0, 4.0),
    ..default()
});

or

b) using directional light

commands.spawn(DirectionalLightBundle {
    transform: Transform::from_rotation(Quat::from_rotation_z(PI / 2.0)),
    ..default()
});

Might related to GPU memory write bandwidth (4.4TB writes/s with 120fps if shadows disabled in contrast to 6.8TB writes/s and 60fps when shadows enabled)