celzero / rethink-app

DNS over HTTPS / DNS over Tor / DNSCrypt client, WireGuard proxifier, firewall, and connection tracker for Android.
https://rethinkfirewall.com/
Apache License 2.0
3.02k stars 154 forks source link

Possible System ANR caused by Rethink 0.5.5n 241117 #1791

Closed haraldrudell closed 2 days ago

haraldrudell commented 5 days ago

Rethink possibly does something it shouldn’t when ConnectivityService tries to figure out how many bytes have been transferred.

Reported after 5 seconds in logcat: ActivityManager: ANR in system

It appears:

After a logcat session and “adb bugreport” I could extract some stack traces that may indicate the issue to someone familiar with the Rethink codebase:

3 threads are blocked, one is labeled “android.ui”:

Thread 10 blocked trying to retrieve basic network information

"Binder:1761_2" prio=5 tid=10 Blocked
  | group="main" sCount=1 ucsCount=0 flags=1 obj=0x16880738 self=0x6f8ce5cb90
  | sysTid=1784 nice=0 cgrp=default sched=0/0 handle=0x6d9c9d5cb0
  | state=S schedstat=( 93178365016 20996661122 312707 ) utm=6596 stm=2720 core=0 HZ=100
  | stack=0x6d9c8de000-0x6d9c8e0000 stackSize=991KB
  | held mutexes=
  at com.android.server.net.NetworkPolicyManagerService.isUidNetworkingBlocked(NetworkPolicyManagerService.java:5658)
  - waiting to lock <0x01ac25bd> (a java.lang.Object) held by thread 73
  at android.net.NetworkPolicyManager.isUidNetworkingBlocked(NetworkPolicyManager.java:561)
  at android.net.connectivity.com.android.server.ConnectivityService.isNetworkWithCapabilitiesBlocked(ConnectivityServic
e.java:2244)
  at android.net.connectivity.com.android.server.ConnectivityService.filterNetworkInfo(ConnectivityService.java:2295)
  at android.net.connectivity.com.android.server.ConnectivityService.getFilteredNetworkInfo(ConnectivityService.java:230
5)
  at android.net.connectivity.com.android.server.ConnectivityService.getActiveNetworkInfo(ConnectivityService.java:2323)
  at android.net.IConnectivityManager$Stub.onTransact(IConnectivityManager.java:800)
  at android.os.Binder.execTransactInternal(Binder.java:1184)
  at android.os.Binder.execTransact(Binder.java:1143)

Thread 12 is blocked trying to do something to the power-save temporary whitelist apps

"android.ui" prio=5 tid=12 Blocked
  | group="main" sCount=1 ucsCount=0 flags=1 obj=0x16880ab0 self=0x6f8ce5e760
  | sysTid=1858 nice=-2 cgrp=default sched=0/0 handle=0x6d878e1cb0
  | state=S schedstat=( 48030805827 21559711586 518964 ) utm=2827 stm=1974 core=2 HZ=100
  | stack=0x6d877de000-0x6d877e0000 stackSize=1039KB
  | held mutexes=
  at com.android.server.net.NetworkPolicyManagerService$NetworkPolicyManagerInternalImpl.onTempPowerSaveWhitelistChange(
NetworkPolicyManagerService.java:5758)
  - waiting to lock <0x01ac25bd> (a java.lang.Object) held by thread 73
  at com.android.server.DeviceIdleController.addPowerSaveTempWhitelistAppDirectInternal(DeviceIdleController.java:2924)
  at com.android.server.DeviceIdleController$LocalService.addPowerSaveTempWhitelistAppDirect(DeviceIdleController.java:2
071)
  at com.android.server.am.ActivityManagerService.pushTempAllowlist(ActivityManagerService.java:14762)
  at com.android.server.am.ActivityManagerService$UiHandler.handleMessage(ActivityManagerService.java:1622)
  at android.os.Handler.dispatchMessage(Handler.java:106)
  at android.os.Looper.loopOnce(Looper.java:201)
  at android.os.Looper.loop(Looper.java:288)
  at android.os.HandlerThread.run(HandlerThread.java:67)
  at com.android.server.ServiceThread.run(ServiceThread.java:44)
  at com.android.server.UiThread.run(UiThread.java:45)

thread 73 looking for a byte-count

"NetworkPolicy" prio=5 tid=73 Blocked                                                                           [0/1876]
  | group="main" sCount=1 ucsCount=0 flags=1 obj=0x16885e38 self=0x6f8ceeebd0
  | sysTid=2082 nice=0 cgrp=default sched=0/0 handle=0x6d39a67cb0
  | state=S schedstat=( 214175918817 13176636000 107735 ) utm=18313 stm=3103 core=0 HZ=100
  | stack=0x6d39964000-0x6d39966000 stackSize=1039KB
  | held mutexes=
  at com.android.server.net.NetworkStatsService.internalGetHistoryForNetwork(NetworkStatsService.java:868)
  - waiting to lock <0x0894e281> (a java.lang.Object) held by thread 72
  at com.android.server.net.NetworkStatsService.internalGetSummaryForNetwork(NetworkStatsService.java:846)
  at com.android.server.net.NetworkStatsService.getNetworkTotalBytes(NetworkStatsService.java:880)
  at com.android.server.net.NetworkStatsService.access$2100(NetworkStatsService.java:175)
  at com.android.server.net.NetworkStatsService$NetworkStatsManagerInternalImpl.getNetworkTotalBytes(NetworkStatsService.java:1677)
  at com.android.server.net.NetworkPolicyManagerService.getNetworkTotalBytes(NetworkPolicyManagerService.java:5496)
  at com.android.server.net.NetworkPolicyManagerService.getTotalBytes(NetworkPolicyManagerService.java:5491)
  at com.android.server.net.NetworkPolicyManagerService.updateNetworkEnabledNL(NetworkPolicyManagerService.java:1891)
  at com.android.server.net.NetworkPolicyManagerService.updateNetworksInternal(NetworkPolicyManagerService.java:1666)
  - locked <0x01ac25bd> (a java.lang.Object)
  - locked <0x0ebaa280> (a java.lang.Object)
  at com.android.server.net.NetworkPolicyManagerService.access$400(NetworkPolicyManagerService.java:335)
  at com.android.server.net.NetworkPolicyManagerService$3.onSubscriptionsChanged(NetworkPolicyManagerService.java:1012)
  at android.telephony.TelephonyRegistryManager$1.lambda$onSubscriptionsChanged$0(TelephonyRegistryManager.java:116)
  at android.telephony.TelephonyRegistryManager$1$$ExternalSyntheticLambda0.run(unavailable:2)
  at android.os.Handler.handleCallback(Handler.java:938)
  at android.os.Handler.dispatchMessage(Handler.java:99)
  at android.os.Looper.loopOnce(Looper.java:201)
  at android.os.Looper.loop(Looper.java:288)
  at android.os.HandlerThread.run(HandlerThread.java:67)

thread 72 holds locks and is not Blocked

"NetworkStats" prio=5 tid=72 Runnable
  | group="main" sCount=0 ucsCount=0 flags=0 obj=0x16885d58 self=0x6f8ceed000
  | sysTid=2080 nice=0 cgrp=default sched=0/0 handle=0x6d3ab71cb0
  | state=R schedstat=( 1797470426316 10061066054 97414 ) utm=176679 stm=3067 core=7 HZ=100
  | stack=0x6d3aa6e000-0x6d3aa70000 stackSize=1039KB
  | held mutexes= "mutator lock"(shared held)
  at android.net.NetworkStats.findIndex(NetworkStats.java:758)
  at android.net.NetworkStats.combineValues(NetworkStats.java:702)
  at android.net.NetworkStats.combineAllValues(NetworkStats.java:748)
  at com.android.server.net.NetworkStatsFactory.adjustForTunAnd464Xlat(NetworkStatsFactory.java:392)
  at com.android.server.net.NetworkStatsFactory.readNetworkStatsDetail(NetworkStatsFactory.java:363)
  - locked <0x085a6e68> (a java.lang.Object)
  at com.android.server.net.NetworkStatsService.readNetworkStatsUidDetail(NetworkStatsService.java:1917)
  at com.android.server.net.NetworkStatsService.getNetworkStatsUidDetail(NetworkStatsService.java:1933)
  at com.android.server.net.NetworkStatsService.recordSnapshotLocked(NetworkStatsService.java:1440)
  at com.android.server.net.NetworkStatsService.performPollLocked(NetworkStatsService.java:1529)
  at com.android.server.net.NetworkStatsService.handleNotifyNetworkStatusLocked(NetworkStatsService.java:1293)
  at com.android.server.net.NetworkStatsService.handleNotifyNetworkStatus(NetworkStatsService.java:1269)
  - locked <0x0894e281> (a java.lang.Object)
  at com.android.server.net.NetworkStatsService.access$500(NetworkStatsService.java:175)
  at com.android.server.net.NetworkStatsService$NetworkStatsHandler.handleMessage(NetworkStatsService.java:385)
  at android.os.Handler.dispatchMessage(Handler.java:106)
  at android.os.Looper.loopOnce(Looper.java:201)
  at android.os.Looper.loop(Looper.java:288)
  at android.os.HandlerThread.run(HandlerThread.java:67)
ignoramous commented 4 days ago

Thanks for the bug report. Must have taken quite some time as ANR dumps aren't all that small.

How frequently do you see this? Is there a set of steps that can reproduce this, even if intermittently?

We perhaps must be careful in our use of ConnectivityService APIs, we use quite a few deprecated ones where we shouldn't...

haraldrudell commented 3 days ago

It is just general use that on rare occasions coincide with statistics collection, I think

It took 4 days for it to happen so it is not frequent, but I was determined to get some data on it — At the time, airplane mode was on, just unlocked and looking at an off-line browser tapping around

After occurring first time after boot, the phone does show occasional slowness, like 30 s without displaying a dialog, or a dialog 2 hours later. The phone is in bad state after restarting a thread or something — A reboot is back to normal: play again

An easy fix would be moving addPowerSaveTempWhitelistAppDirect off the UI thread using app-scoped Kotlin coroutine? while(true) and a semaphore launch the coroutine from invoking method from the Application component onCreate The impact will then be nothing even if it locks up. Costs nothing, a coroutine is an object instance coroutines in Kotlin is a time-shared thread-group, poor man’s parallelism

If it’s then still suspected to be broken, it can be instrumented with display in the UI

ignoramous commented 3 days ago

Thanks. If you are comfortable, please consider emailing me (mz at celzero dot com) the full ANR dump. No pressure.

An easy fix would be moving addPowerSaveTempWhitelistAppDirect off the UI thread using app-scoped Kotlin coroutine?

That code is Android framework's (Android's code that runs in each app's process or acts as a "server" processing app's requests). Albeit, these code paths may have been triggered by our code. We have to do a sweep to see which one it might be; not straight forward without the entire ANR trace (if you can't share it is okay).

haraldrudell commented 2 days ago

This has nothing to do with Rethink, it’s some hardware issue with the device

  1. The System ANR comes directly from native code, unaffected by any app
  2. all 4 threads are doing good things, none is locked-up
  3. The system_service UI thread executes PUSH_TEMP_ALLOWLIST_UI_MSG, a message clearly designed for it and the thread itself appears to draw on-top things like volume sliders and such, independent of any app

At present, I can’t remember if I saw the dialog on another device

This phone is declared working for like 4 days at a time, after which reboot to continue