ftctechnh / ftc_app

FTC Android Studio project to create FTC Robot Controller app.
761 stars 3.16k forks source link

UVC driver usually crashes Linux kernel if run multuiple times in a row #681

Closed Windwoes closed 4 years ago

Windwoes commented 5 years ago

Here's a folder with videos of all my tests

Title basically says it all.

Here's the syslog of the event:

[  967.760390] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[  967.760452] pgd = e8e38000
[  967.760548] [00000000] *pgd=00000000
[  967.760656] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[  967.760711] CPU: 0    Not tainted  (3.4.0-g853158b #1)
[  967.760810] PC is at xhci_free_segments_for_ring+0x2c/0x88
[  967.760866] LR is at xhci_free_segments_for_ring+0x54/0x88
[  967.760964] pc : [<c0623f24>]    lr : [<c0623f4c>]    psr: a0000093
[  967.760967] sp : e83e7c18  ip : 60000093  fp : e83e7c3c
[  967.761109] r10: 00000000  r9 : 00000001  r8 : 00000000
[  967.761162] r7 : e9942000  r6 : e845f940  r5 : 00000000  r4 : 00000000
[  967.761255] r3 : 00000001  r2 : c1131c8c  r1 : 012bf000  r0 : ee002180
[  967.761353] Flags: NzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  967.761409] Control: 10c5787d  Table: 3123806a  DAC: 00000015

It traces back to:

#18 pc 0001da97  /data/app/com.qualcomm.ftcrobotcontroller-2/lib/arm/libRobotCore.so (_Z22uvc_user_callback_mainP17uvc_stream_handle+274)

Also, I did a hotplug recovery test (this time using ZTEs since I figured the Tech Team would have verified it worked with those) and it FAILED all 3 tests I did.

cmacfarl commented 5 years ago

Can you please clarify what you mean by "run multiple times in a row"? -- Thanks.

Windwoes commented 5 years ago

@cmacfarl sure. So basically if I run the sample Vuforia webcam OpMode, stop it, and repeat 3 more times, the Linux kernel dies and the entire Android OS crashes, as you can see in this video.

cmacfarl commented 5 years ago

Ack. Thanks.

cmacfarl commented 5 years ago

This was tested over the weekend using two Moto E4's, an Anker hub, USB battery pack, and a Logitech C920, and was not reproducible by following the procedures in the video.

It's unclear from your text above whether or not your test with ZTE's followed the procedures documented in your video or whether your hotplug description refers to something else. Please clarify and supply an inventory of the hardware devices under test. Thanks.

Windwoes commented 5 years ago

@cmacfarl

Hardware used:

However, as you can see in the other videos, the crash occurs even without the meter and USB hub.

Also, to confirm that it is an issue with your UVC driver, I compiled and ran the the sample UVCCamera app which uses libuvc (the same library you use) and it ran multiple times with no issues at all.

Regarding the ZTE hotplug test, that is totally separate and unrelated, I was just trying to point out that there seems to be a general instability in your UVC driver.

rgatkinson commented 5 years ago

Nit (just for posterity): the libuvc used in the FTC SDK is not bug-for-bug compatible with saki's. :-)

qwertychouskie commented 5 years ago

This should be reported to Google as a Denial of Service (DoS) attack, a user-space app should NEVER be able to crash the kernel. BTW, sometimes getting reboots on our S5, may be this issue. Will test soon.

Windwoes commented 5 years ago

@qwertychouskie I believe the Tech Team's UVC driver may be triggering a bug in the USB driver stack, as the syslog mentions PC is at xhci_free_segments_for_ring+0x2c/0x88, and xhci is the USB driver.

qwertychouskie commented 5 years ago

Confirmed to reproduce on the Galaxy S5 with out team's Auto program (https://github.com/FTCTeam10298/2018-19-code). The first 4th run we got an emergency stop, something about calling getCameraFocus (or something similar) on a null object. We did Restart Robot, then it kept saying that the camera could not be found (as in when it is not plugged in). Unplugging the MicroUSB OTG cord from the phone and plugging it back in cleared this. The second 4th run, the phone froze and rebooted. Once someone figures out what kernel code is crashing, this should be reported as a kernel bug.

Windwoes commented 5 years ago

@qwertychouskie any way you could grab the syslog for your kernel crash so we can verify that you're seeing the issue at

(_Z22uvc_user_callback_mainP17uvc_stream_handle+274)

as well?

qwertychouskie commented 5 years ago

@float23 Not to be "that guy", but the Rev Expansion Hubs are way better than the Modern Robotics modules, especially when it comes to handling hot reconnects, I highly recommend the Rev hub. We use 2 Rev hubs and the Galaxy S5s and it's great.

Windwoes commented 5 years ago

@cmacfarl @rgatkinson any update on whether this will be fixed in the foreseeable future? If not then I'll probably work on integrating my own UVC driver.

sbdevelops commented 5 years ago

My team switched to using a Logitech C920 recently, and I've been testing autonomous over the past couple days, noticing occasional crashes when starting autonomous (which immediately is supposed to start Vuforia/TFOD). From all of my research, this driver issue seems to be the cause. I've experienced the same crash symptoms as the video produced by @FROGbots-4634. I'll post syslogs next time a crash occurs. Samsung Galaxy S5 is being used as our RC phone. @cmacfarl @rgatkinson With my team going to Houston in a few days, how do I ensure this does not occur during competition?

Windwoes commented 5 years ago

@sbdevelops you can ensure it won't happen during competition by never running it more than 3 times in a row without unplugging/replugging the phone.

Windwoes commented 5 years ago

@rgatkinson @cmacfarl will this be addressed in v5.x?

Windwoes commented 5 years ago

@cmacfarl @rgatkinson I have confirmed this issue still exists in SDK v5.2, and running on another device: 1st gen Pixel XL running 7.1.2. While the symptoms are not exactly the same as the Nexus 5, the UVC driver can still cause kernel to crash if run multiple times. I definitely think this warrants investigation....

Windwoes commented 5 years ago

@cmacfarl @rgatkinson this issue also affects the Moto G5 Plus, again in a slightly different manner. On the G5 I was not able to get the Linux kernel to crash, but I was able to get the OpMode to crash very occasionally with "IllegalArgumentException - pointer must not be null" but more reliably I can get the camera to fail to initialize and seemingly disappear from the USB bus and then re-appear a second later (Message "Warning: unable to find Webcam 1" appears for a second).

Windwoes commented 4 years ago

Fixed in v5.5