FIRST-Tech-Challenge / FtcRobotController

BSD 3-Clause Clear License
851 stars 5.28k forks source link

6.1 Causes a soft reboot on the E5 Play #79

Closed NoahBres closed 3 years ago

NoahBres commented 3 years ago

The 6.1 SDK will cause a soft reboot after a while. It only seems to be caused once the RC phone is paired to the DS phone. I can force run opmodes via FTC-Dashboard completely fine when the connection between the RC and DS is severed. But, once I make that connection, within a minute or two, regardless of whether an opmode is running and regardless of whether the phone is plugged into an expansion hub or not (I am using phones + expansion hub, not a control hub), the phone will soft reboot. It is definitely a soft reboot because the tcpip server still lives after the reboot. Plus the reboot is much faster than a normal one.

(#60 is unrelated)I believe issue #60 is actually related to this. The behavior they describe is exactly the same as mine. However, the match logs they provide have no indication that anything actually crashes. ~~Most likely because the OS has killed the app's logcat monitor (i have no clue what im talking about). The log I posted has the same error as #60, "unexpected exception thrown during lynx communication" but that's only because I ran an opmode in that sample.~~ The app will crash regardless of whether an opmode is running or not.

Behavior is completely normal on 6.0. Behavior is still prevalent in a fresh clone of 6.1. Behavior is still relevant if I keep the FtcRobotControllerActivity the same as 6.0 so it's an internal SDK specific problem.

Relevant errors on line 655: https://gist.github.com/NoahBres/2f8e310106f5a7c1496b5632098d6327

Windwoes commented 3 years ago

Would it be possible to provide a video of said behavior?

NoahBres commented 3 years ago

https://drive.google.com/file/d/1ufhvSyGSilHeCMRR6J-8My_pzS_xUA5X/view?usp=sharing

Video uses FTC-Dashboard just to demonstrate that you can still run opmodes. The app does still crash with a completely fresh installation. No dashboard dependency. No new opmodes. Etc.

Further clarification: They're e5 plays. However, #60 seems to be a control hub so I'm assuming it's not device dependent.

NoahBres commented 3 years ago

Model: e5 play Android version: 8.0.0 Build number: OCP27.91-157

NoahAndrews commented 3 years ago

Wow. That video left my mouth open. Thanks for taking the time to make that.

A hunch from a quick look at the logs is that it might be related to sound playing. Can you please turn off sounds in the RC settings and try again? Please provide logs from that run as well, if it still crashes.

NoahAndrews commented 3 years ago

I'm curious why you think this is related to #60. That user was reporting intermittent crashes during an OpMode run, but you're reporting consistent crashes on DS connect.

NoahBres commented 3 years ago

I did notice that the DeadSystemException was preceded by the "did not receive expected priority boost on time" but that didn't really surface anything on google. The crash still occurs with the sound off. It seems that it still attempts to play the sounds. Just at 0 volume.

https://gist.github.com/NoahBres/2e2b0cd3b3365a71ea7a8d7b6caf643a

NoahAndrews commented 3 years ago

@NoahBres Ooh. Please try turning off the 5 GHz warning in the RC settings and see if that works around the issue.

NoahBres commented 3 years ago

That does prevent the crashing!

Windwoes commented 3 years ago

That does prevent the crashing!

Yeah, this is the smoking gun then

at android.net.wifi.WifiManager.is5GHzBandSupported(WifiManager.java:1473)
NoahAndrews commented 3 years ago

Did you turn off sounds, the warning, or both?

Edit: oh, I missed your other message

NoahBres commented 3 years ago

Turned the sound back on. It works fine! Turning the 5GHz warning off seemed to prevent the crashes.

smartynenko commented 3 years ago

Same here. Turning off 5Ghz warning fixed the issue. Only happens on Moto E5 RC after DS connection.

2021-01-06 10:42:43.189 1397-2019/? A/zygote: runtime.cc:532] JNI ERROR (app bug): weak global reference table overflow (max=51200)
2021-01-06 10:42:43.189 1397-2019/? A/zygote: runtime.cc:532] weak global reference table dump:
2021-01-06 10:42:43.189 1397-2019/? A/zygote: runtime.cc:532]   Last 10 entries (of 51200):
2021-01-06 10:42:43.189 1397-2019/? A/zygote: runtime.cc:532]     51199: 0x13becd48 android.os.HwParcel
2021-01-06 10:42:43.189 1397-2019/? A/zygote: runtime.cc:532]     51198: 0x13becce8 android.os.HwParcel
2021-01-06 10:42:43.189 1397-2019/? A/zygote: runtime.cc:532]     51197: 0x13becc20 android.os.HwBlob
2021-01-06 10:42:43.190 1397-2019/? A/zygote: runtime.cc:532]   Summary:
2021-01-06 10:42:43.190 1397-2019/? A/zygote: runtime.cc:532]     26949 of android.os.HwBlob (26949 unique instances)
2021-01-06 10:42:43.190 1397-2019/? A/zygote: runtime.cc:532]      1878 of android.os.HwRemoteBinder (1878 unique instances)
2021-01-06 10:42:43.190 1397-2019/? A/zygote: runtime.cc:532]        31 of java.lang.DexCache (31 unique instances)
2021-01-06 10:42:43.190 1397-2019/? A/zygote: runtime.cc:532]        17 of dalvik.system.PathClassLoader (8 unique instances)
2021-01-06 10:42:43.190 1397-2019/? A/zygote: runtime.cc:532]         1 of com.android.server.wm.Session
2021-01-06 10:42:43.190 1397-2019/? A/zygote: runtime.cc:532] 

    --------- beginning of crash
2021-01-06 10:42:43.202 1397-2019/? A/libc: Fatal signal 6 (SIGABRT), code -6 in tid 2019 (WifiStateMachin)
2021-01-06 10:42:43.364 10179-10179/? I/crash_dump32: obtaining output fd from tombstoned
2021-01-06 10:42:43.368 937-937/? I//system/bin/tombstoned: received crash request for pid 1397
2021-01-06 10:42:43.372 10179-10179/? I/crash_dump32: performing dump of process 1397 (target tid = 2019)
NoahBres commented 3 years ago

I know y'all have this all figured out already internally but just in case anyone stumbles upon this issue and was curious:

The app causes an OS soft reboot on the E5 when this particular line is called:

https://github.com/OpenFTC/OpenRC-Turbo/blob/b75e0f8da4925c077c6c6e5d9d119676ebba2c56/RobotCore/src/main/java/org/firstinspires/ftc/robotcore/internal/network/WifiUtil.java#L173

wifiManager.is5GHzBandSupported(). Anything that calls isFeatureSupported() (internal function) in the WifiManager seems to crash. https://android.googlesource.com/platform/frameworks/base/+/refs/heads/oreo-release/wifi/java/android/net/wifi/WifiManager.java

However, a blank app with the same compileSdkVersion, minSdkVersion, targetSdkVersion, and buildToolsVersion runs fine. So some other build config in the app SDK is causing it. I am not knowledgeable enough to figure that out 🤷

Workaround: In the RC app settings, turn off "2.4 GHz WiFi warning"

smartynenko commented 3 years ago

wifiManager.is5GHzBandSupported() by itself works just fine, but it is invoked in every iteration of RecvLoopRunnable.run() [when robot attempts to refresh telemetry] and overloads internal system buffer:

A/zygote: indirect_reference_table.cc:306] JNI ERROR (app bug): weak global reference table overflow (max=51200)

Adding 100 ms delay fixes the issue, but probably creates some other downstream issues.

if (Device.isRevControlHub()) {
    return AndroidBoard.getInstance().supports5GhzAp();
}
else {
    try {
        sleep(100);
    }
    catch (Exception ex) {
        Log.e(TAG, "why?");
    }
    WifiManager wifiManager = getWifiManager();
    return wifiManager.is5GHzBandSupported();
}

Do we need to check 5GHz support in every loop iteration or it can be done once and stored in a variable?

NoahAndrews commented 3 years ago

This issue will be fixed in the next version of the Robot Controller app.

Windwoes commented 3 years ago

SDK v6.2 has been released, which fixes this issue.