OpenNuvoton / NUC970_Linux_Kernel

Linux Kernel Source Code for NUC970 Series Microprocessor
Other
68 stars 69 forks source link

USB HUB caused the platform to crash #47

Closed yangqingshi closed 2 years ago

yangqingshi commented 4 years ago

When i was testing, we encountered that the system might crash because of USB HUB. The test method is as follows: When using 4G modular devices on the platform,Then reset the 4G module again and again,The platform will crash in about three days.

Here is the LOG printed by the serial port at reset:

usb 1-2: new high-speed USB device number 33 using nuc970-ehci usb 1-2: new high-speed USB device number 34 using nuc970-ehci usb 1-2: new high-speed USB device number 35 using nuc970-ehci usb 1-2: new high-speed USB device number 36 using nuc970-ehci usb 1-2: new high-speed USB device number 37 using nuc970-ehci usb 1-2: new high-speed USB device number 38 using nuc970-ehci usb 1-2: new high-speed USB device number 39 using nuc970-ehci usb 1-2: new high-speed USB device number 40 using nuc970-ehci usb 2-2: new full-speed USB device number 95 using nuc970-ohci usb 2-2: not running at top speed; connect to a high speed hub usb 2-2: USB disconnect, device number 95 usb 1-2: new high-speed USB device number 41 using nuc970-ehci usb 1-2: new high-speed USB device number 42 using nuc970-ehci usb 1-2: new high-speed USB device number 43 using nuc970-ehci usb 1-2: new high-speed USB device number 44 using nuc970-ehci usb 1-2: new high-speed USB device number 45 using nuc970-ehci usb 1-2: new high-speed USB device number 46 using nuc970-ehci usb 1-2: new high-speed USB device number 47 using nuc970-ehci usb 1-2: new high-speed USB device number 48 using nuc970-ehci usb 1-2: new high-speed USB device number 49 using nuc970-ehci usb 2-2: new full-speed USB device number 96 using nuc970-ohci usb 2-2: not running at top speed; connect to a high speed hub usb 2-2: USB disconnect, device number 96 usb 1-2: new high-speed USB device number 50 using nuc970-ehci usb 1-2: new high-speed USB device number 51 using nuc970-ehci usb 1-2: new high-speed USB device number 52 using nuc970-ehci usb 1-2: new high-speed USB device number 53 using nuc970-ehci usb 1-2: new high-speed USB device number 54 using nuc970-ehci usb 1-2: new high-speed USB device number 55 using nuc970-ehci usb 1-2: new high-speed USB device number 56 using nuc970-ehci usb 1-2: new high-speed USB device number 57 using nuc970-ehci usb 1-2: new high-speed USB device number 58 using nuc970-ehci usb 1-2: new high-speed USB device number 59 using nuc970-ehci usb 2-2: new full-speed USB device number 97 using nuc970-ohci usb 2-2: not running at top speed; connect to a high speed hub usb 2-2: USB disconnect, device number 97 usb 1-2: new high-speed USB device number 60 using nuc970-ehci usb 1-2: new high-speed USB device number 61 using nuc970-ehci usb 2-2: new full-speed USB device number 98 using nuc970-ohci usb 2-2: not running at top speed; connect to a high speed hub usb 2-2: USB disconnect, device number 98 usb 1-2: new high-speed USB device number 62 using nuc970-ehci usb 1-2: new high-speed USB device number 63 using nuc970-ehci usb 2-2: new full-speed USB device number 99 using nuc970-ohci usb 2-2: not running at top speed; connect to a high speed hub

Additionally we merged 0065 dadaa859642f26f3c9c85eb168153921c63f the changes.Using the same test environment, it has been the fourth day, there has been no crash phenomenon. May I ask what problem this change is mainly to fix?Whether it can also solve the problem I encountered?

We have been using before version of the code is 81 e3c49a1d2eb5870a705e308bad1caad9a79ec0

yachen commented 4 years ago

Hi,

I'm not seeing kernel crash message in your log. Patch 4a3f170 solves kernel hang/crash issue while connect/disconnect WiFi/LTE dongle repeatedly. Patch 0065dad solves device connect on USB hub port 1 stop working issue introduced by previous patch.

I think the problem you encounter is fixed after merged 4a3f170.

Sincerely,

Yi-An Chen

yangqingshi commented 4 years ago

hi The code I tested already contains 4a3f170. It is proved that there is no fix, but 0065dad seems to be able to solve it, but the test time is not long enough, and we still need to test for a long time to see if it can be solved。

Here's the log of the rcu core that was printed when the terminal crashed:

INFO: rcu_preempt self-detected stall on CPU { 0} (t=33615 jiffies g=98775 c=98774 q=11) CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.108 #1 Backtrace: [] (dump_backtrace+0x0/0x10c) from [] (show_stack+0x18/0x1c) r6:c041ca34 r5:c041ca34 r4:c041c748 r3:00000000 [] (show_stack+0x0/0x1c) from [] (dump_stack+0x20/0x28) [] (dump_stack+0x0/0x28) from [] (rcu_check_callbacks+0x2b4/0x790) [] (rcu_check_callbacks+0x0/0x790) from [] (update_process_times+0x44/0x70) [] (update_process_times+0x0/0x70) from [] (tick_periodic.constprop.4+0x38/0xc0) r6:c03f8000 r5:00000453 r4:351eaf00 r3:20000013 [] (tick_periodic.constprop.4+0x0/0xc0) from [] (tick_handle_periodic+0x18/0x78) r7:00000000 r6:c0405620 r5:c03f8000 r4:00000001 [] (tick_handle_periodic+0x0/0x78) from [] (nuc970_timer0_interrupt+0x24/0x34) r8:00000010 r7:00000000 r6:00000000 r5:c03f8000 r4:00000001 [] (nuc970_timer0_interrupt+0x0/0x34) from [] (handle_irq_event_percpu+0x38/0x1ac) r4:c0405700 r3:c001a160 [] (handle_irq_event_percpu+0x0/0x1ac) from [] (handle_irq_event+0x60/0x90) [] (handle_irq_event+0x0/0x90) from [] (handle_level_irq+0xa0/0x120) r5:c03f8020 r4:c0409df0 [] (handle_level_irq+0x0/0x120) from [] (generic_handle_irq+0x2c/0x40) r5:00000010 r4:c04205b4 [] (generic_handle_irq+0x0/0x40) from [] (handle_IRQ+0x38/0x8c) [] (handle_IRQ+0x0/0x8c) from [] (asm_do_IRQ+0x10/0x14) r6:f0000000 r5:20000013 r4:c00247e8 r3:c0024780 [] (asm_do_IRQ+0x0/0x14) from [] (irq_svc+0x30/0x74) Exception stack(0xc03f9e38 to 0xc03f9e80) 9e20: 00000001 c04535c0 9e40: 00000000 20000013 00000202 00000010 00000000 c04535e0 c0431cce c0431cce 9e60: 003f29e4 c03f9ecc 0000000a c03f9e80 c0024780 c00247e8 20000013 ffffffff [] (__do_softirq+0x0/0x1d8) from [] (do_softirq+0x54/0x60) [] (do_softirq+0x0/0x60) from [] (irq_exit+0x5c/0x9c) r4:c04205b4 r3:00000202 [] (irq_exit+0x0/0x9c) from [] (handle_IRQ+0x3c/0x8c) r4:c04205b4 r3:00000002 [] (handle_IRQ+0x0/0x8c) from [] (asm_do_IRQ+0x10/0x14) r6:f0000000 r5:60000013 r4:c000fd20 r3:c000fd28 [] (asm_do_IRQ+0x0/0x14) from [] (irq_svc+0x30/0x74) Exception stack(0xc03f9f30 to 0xc03f9f78) 9f20: 00000000 0005317f 0005217f 60000013 9f40: c03f8000 c03f8000 c0400074 c03f8000 c0431cce c0431cce 003f29e4 c03f9f84 9f60: 600000d3 c03f9f78 c000fd28 c000fd20 60000013 ffffffff [] (arch_cpu_idle+0x0/0x3c) from [] (cpu_startup_entry+0xbc/0x108) [] (cpu_startup_entry+0x0/0x108) from [] (rest_init+0x78/0x90) r7:c0400000 r3:c0312dcc [] (rest_init+0x0/0x90) from [] (start_kernel+0x27c/0x2c8) r4:c04000e0 r3:00000000 [] (start_kernel+0x0/0x2c8) from [<00008040>] (0x8040)

Another phenomenon that appears to be a kernel deadlock is the following log:

usb 1-2: new high-speed USB device number 50 using nuc970-ehci usb 1-2: new high-speed USB device number 51 using nuc970-ehci usb 1-2: new high-speed USB device number 52 using nuc970-ehci usb 1-2: new high-speed USB device number 53 using nuc970-ehci usb 1-2: new high-speed USB device number 54 using nuc970-ehci usb 1-2: new high-speed USB device number 55 using nuc970-ehci usb 1-2: new high-speed USB device number 56 using nuc970-ehci usb 1-2: new high-speed USB device number 57 using nuc970-ehci usb 1-2: new high-speed USB device number 58 using nuc970-ehci usb 2-2: new full-speed USB device number 32 using nuc970-ohci usb 2-2: not running at top speed; connect to a high speed hub usb 2-2: USB disconnect, device number 32 usb 1-2: new high-speed USB device number 59 using nuc970-ehci usb 2-2: new full-speed USB device number 33 using nuc970-ohci usb 2-2: not running at top speed; connect to a high speed hub usb 2-2: USB disconnect, device number 33 usb 2-2: new full-speed USB device number 34 using nuc970-ohci usb 2-2: not running at top speed; connect to a high speed hub usb 2-2: USB disconnect, device number 34 usb 2-2: new full-speed USB device number 35 using nuc970-ohci usb 2-2: not running at top speed; connect to a high speed hub usb 2-2: USB disconnect, device number 35 usb 2-2: new full-speed USB device number 36 using nuc970-ohci usb 2-2: not running at top speed; connect to a high speed hub usb 2-2: USB disconnect, device number 36 usb 2-2: new full-speed USB device number 37 using nuc970-ohci usb 2-2: not running at top speed; connect to a high speed hub usb 2-2: USB disconnect, device number 37 usb 2-2: new full-speed USB device number 38 using nuc970-ohci usb 2-2: not running at top speed; connect to a high speed hub

Based on long-term testing, there are currently two types of crashes