firebase / firebase-android-sdk

Firebase Android SDK
https://firebase.google.com
Apache License 2.0
2.28k stars 579 forks source link

[firebase-perf] com.google.firebase.perf.trace function has significant impact on performance and causing ANRs #6407

Open hakanbagci opened 1 month ago

hakanbagci commented 1 month ago

[READ] Step 1: Are you in the right place?

Issues filed here should be about bugs in the code in this repository. If you have a general question, need help debugging, or fall into some other category use one of these other channels:

[REQUIRED] Step 2: Describe your environment

[REQUIRED] Step 3: Describe the problem

Steps to reproduce:

Every firebase performance trace function call makes the following check while starting a trace.

https://github.com/firebase/firebase-android-sdk/blob/a00c3109de58cb4e6c53c80e05b9f79a6d3f64c9/firebase-perf/src/main/java/com/google/firebase/perf/metrics/Trace.java#L209

This check calls the following function:

https://github.com/firebase/firebase-android-sdk/blob/main/firebase-perf/src/main/java/com/google/firebase/perf/config/ConfigResolver.java#L207-L209

And this call reaches to the following function that edits a shared preference and applies the change. This is done for every trace.

https://github.com/firebase/firebase-android-sdk/blob/main/firebase-perf/src/main/java/com/google/firebase/perf/config/DeviceCacheManager.java#L133-L146

If network performance traces are enabled, screen traces are enabled and if you have some custom traces, you end up with hundreds of calls to trace start function which edits preferences and calls apply function.

Since apply does the persistence in a background thread we can think that it is safe and it will not cause any issues in main thread. However, it has a significant impact on main thread and it causes significant number of ANRs. In our 9 applications this ANR is in top 3. We identified that more than 80% of SharedPreferencesImpl.apply calls are coming from firebase performance library among all the calls to the apply function.

Every apply function calls creates a task to be awaited by QueuedWork which is an internal utility class to keep track of process-global work that’s outstanding and hasn’t been finished yet.This was created for writing SharedPreference edits out asynchronously so we’d have a mechanism to wait for the writes in Activity.onPause and similar places, but we may use this mechanism for other things in the future. (This is from documentation)

https://android.googlesource.com/platform/frameworks/base.git/+/master/core/java/android/app/SharedPreferencesImpl.java#503

When activity is stopped and if you have several pending work that are scheduled through several apply functions, then QueuedWork runs all the remaining task on main thread to ensure that all shared preferences are persisted. As expected in some cases this is causing ANRs. This ANRs become more visible when a new ANR reporting is included with Android 14.

Here is a screenshot of a trace for this scenario.

I think reducing the usage of shared preferences apply usage and finding a more performant check for isPerformanceMonitoringEnabled could be the solution. Rather then calling this function every time maybe we can consider caching the value which will prevent excessive calls to SharedPreferences.apply function.

Screenshot 2024-10-23 at 20 58 33

Relevant Code:

com.google.firebase.perf.trace("test_trace") {
    test()
}
lehcar09 commented 1 month ago

Hi @hakanbagci , thank you for reaching out and reporting the issue. Also, Thanks a lot for the details you provided. I’m currently working on replicating this issue, based on the findings you shared, but so far I haven’t been able to encounter an ANR issue. By any chance, could you share a minimal reproducible example with us? This will help us to investigate the issue. Thanks!

hakanbagci commented 1 month ago

Hi @hakanbagci , thank you for reaching out and reporting the issue. Also, Thanks a lot for the details you provided. I’m currently working on replicating this issue, based on the findings you shared, but so far I haven’t been able to encounter an ANR issue. By any chance, could you share a minimal reproducible example with us? This will help us to investigate the issue. Thanks!

Hi @lehcar09,

Thanks for checking the issue.

Unfortunately, it is a difficult case to reproduce. However, stack traces in production clearly shows that excessive usage of shared preferences apply function is causing the ANR. The following articles also mention similar ANR issues that are caused by shared preferences apply function.

https://medium.com/bumble-tech/how-we-achieved-a-6x-reduction-of-anrs-part-2-fixing-anrs-24fedf9a973f https://engineering.avast.io/how-we-fought-with-anr-rate-in-android-vitals/

Given this finding, is it possible to improve the performance of isPerformanceMonitoringEnabled() function where it is always calling apply function for each trace call. I think we do not need to reproduce this issue because writing to shared preferences in every performance trace is definitely a performance issue and will introduce non-determinism because of the IO.

Is it possible to cache the value of isPerformanceMonitoringEnabled() function and reuse it in every trace start call? I think this will significantly improve the performance of trace start function.

lehcar09 commented 4 weeks ago

Thank you for additional details. I'll raise this to our engineering team and see what we can do here. Thanks!

zoe00 commented 3 weeks ago

Hi @lehcar09 is this an ETA for this fix? Asking because it's negatively influencing our metrics.

cc @hakanbagci

hakanbagci commented 2 weeks ago

Hi @lehcar09,

@exaby73 proposed a solution to fix the issue in this PR. Is it possible to prioritize the review of it and making it part of the next release?

lehcar09 commented 2 weeks ago

Unfortunately, we are currently in a code freeze period in preparation for our upcoming release. Looking into the PR, our engineers are already working on it. I cannot provide a definitive timeline at this point. However, I’ll bump this up for review.

bighuang111 commented 1 week ago

@lehcar09 Any updates?