AltBeacon / android-beacon-library

Allows Android apps to interact with BLE beacons
Apache License 2.0
2.84k stars 836 forks source link

Synchronisation of ScanJob.onStartJob and ScanJob.onStopJob causes ANR sporadically by blocking the main thread #1106

Open liedQM opened 1 year ago

liedQM commented 1 year ago

Expected behavior

The synchronisation for onStartJob and onStopJob should not block the main thread and causing ANRs. To free the main thread the synchronised code can be dispatched on a single threaded executor which keeps the execution order and prevents the main thread from being blocked. With a single thread executor the synchronisation can eventually be removed completely.

Actual behavior

The android JobScheduler calls ScanJob.onStopJob on the main thread and the used synchronisation blocks it causing ANRs for the apps.

Steps to reproduce this behavior

We could trigger the ANR 4 times in 363 app starts. Our app starts or stops the beacon scanner depending on the current configuration of the user. But the issue is directly visible in the code and I've added a proposal how to solve the problem.

Mobile device model and OS version

Samsung S20 Ultra (SM-G988B) with Android 12 (G988BXXUEFVH3)

Android Beacon Library version

2.19.4

The ANR has been observed with ANRWatchDog running in the default configuration (5s timeout):

09-14 19:20:36:107: E [Main] xxxApplication:2 (|ANR-WatchDog|) =========== Received ANR report ===========
Message: Application Not Responding
Stacktrace: com.github.anrwatchdog.ANRError: Application Not Responding
Caused by: com.github.anrwatchdog.ANRError$$$_Thread: main (state = BLOCKED)
    at org.altbeacon.beacon.service.ScanJob.onStopJob(Unknown Source:10)
    at android.app.job.JobService$1.onStopJob(JobService.java:67)
    at android.app.job.JobServiceEngine$JobHandler.handleMessage(JobServiceEngine.java:114)
    at android.os.Handler.dispatchMessage(Handler.java:106)
    at android.os.Looper.loopOnce(Looper.java:226)
    at android.os.Looper.loop(Looper.java:313)
    at android.app.ActivityThread.main(ActivityThread.java:8669)
    at java.lang.reflect.Method.invoke(Native Method)
    at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:571)
    at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1135)
davidgyoung commented 1 year ago

Thanks for this report, @liedQM. You mention you have a proposed solution. Where can we see that?

liedQM commented 1 year ago

I‘ve described in in the expected behavior (single threaded executor dispatch) but I can draft a solution as PR next week if you like.

MarkusHerrmannsdoerfer commented 1 year ago

Hi @davidgyoung, before we invest effort into drafting a solution, it would be good that you are generally OK with a "single threaded executor dispatch" as proposed by @liedQM. Thanks!

davidgyoung commented 1 year ago

Hi, @MarkusHerrmannsdoerfer. Yes, I am in general OK with this approach.

However, I think the hard part about this is testing to make sure that any change doesn't make things worse. It is really hard to do this for intermittent problems that are relatively rare. Based on the numbers from @liedQM, if this happens 4/363 times (about 1 percent of the time), how do you know if a change reduces cases from 1% to 0.1% or increases them from 1% to 2%? This is impossible to test with manual launches.

I think we should (1) create a beta release with a change (2) put the beta release in a production app (3) measure if the rate of ANRs goes down over a few weeks. (4) If we see positive results, merge the change and roll a final release.

@liedQM or @MarkusHerrmannsdoerfer, do either of you work on an app where we could try out the change from a beta release?