Cascoda / ca8210-linux

Linux kernel driver for direct SPI communication with Cascoda's CA-8210 IEEE 802.15.4 transceiver
BSD 3-Clause "New" or "Revised" License
5 stars 5 forks source link

Too many retries error creates unrecoverable state #11

Open CiaranWoodward opened 7 years ago

CiaranWoodward commented 7 years ago

Occasionally the driver will attempt SPI writes too many times and fail with the "too many retries" error message. This will cause the driver to remove itself, however attempting to reattach the driver will fail with no error, and attempting to remove the module will crash the kernel.

Also, under heavy traffic, this can happen extremely quickly, so a softer recovery path would be preferred.

Crash:

[   98.943234] ca8210 spi0.0: retried spi write
[   98.944753] ca8210 spi0.0: ca8210 was busy during attempted write
[   98.963296] ca8210 spi0.0: retried spi write
[   98.964706] ca8210 spi0.0: ca8210 was busy during attempted write
[   98.983175] ca8210 spi0.0: retried spi write
[   98.983860] ca8210 spi0.0: ca8210 was busy during attempted write
[   99.003168] ca8210 spi0.0: retried spi write
[   99.003854] ca8210 spi0.0: ca8210 was busy during attempted write
[   99.023167] ca8210 spi0.0: retried spi write
[   99.023851] ca8210 spi0.0: ca8210 was busy during attempted write
[   99.043167] ca8210 spi0.0: retried spi write
[   99.043852] ca8210 spi0.0: ca8210 was busy during attempted write
[   99.043869] ca8210 spi0.0: too many retries!
[   99.043885] ca8210 spi0.0: Removing ca8210
[   99.043925] ca8210 spi0.0: sync_down = 4124, sync_up = 4123
[   99.313288] ca8210 spi0.0: Unregistered & freed ieee802154_hw.
[   99.313330] ca8210 spi0.0: Test interface removed

Kernel error if attempt to remove again:

May 17 12:48:09 Raspi2B-A kernel: [   99.043925] ca8210 spi0.0: sync_down = 4124, sync_up = 4123
May 17 12:48:09 Raspi2B-A ifplugd: Stopping ifplugd for wpan0
May 17 12:48:10 Raspi2B-A kernel: [   99.313288] ca8210 spi0.0: Unregistered & freed ieee802154_hw.
May 17 12:48:10 Raspi2B-A kernel: [   99.313330] ca8210 spi0.0: Test interface removed
May 17 12:48:10 Raspi2B-A ifplugd(wpan0)[1922]: Link beat lost.
May 17 12:48:10 Raspi2B-A ifplugd(wpan0)[1922]: Exiting.
May 17 12:50:12 Raspi2B-A kernel: [  221.905935] ca8210 spi0.0: Removing ca8210
May 17 12:50:12 Raspi2B-A kernel: [  221.905963] ca8210 spi0.0: sync_down = 4124, sync_up = 4123
May 17 12:50:12 Raspi2B-A kernel: [  221.905989] Unable to handle kernel NULL pointer dereference at virtual address 00000000
May 17 12:50:12 Raspi2B-A kernel: [  221.907694] pgd = b7d40000
May 17 12:50:12 Raspi2B-A kernel: [  221.908538] [00000000] *pgd=00000000
May 17 12:50:12 Raspi2B-A kernel: [  221.909402] Internal error: Oops: 5 [#1] SMP ARM
May 17 12:50:12 Raspi2B-A kernel: [  221.910266] Modules linked in: ca8210sm(O-) mac802154 ieee802154 crc_ccitt cfg80211 rfkill ipv6 snd_bcm2835 snd_pcm snd_timer snd spi_bcm2835 bcm2835_wdt bcm2835_gpiomem uio_pdrv_genirq uio
May 17 12:50:12 Raspi2B-A kernel: [  221.913246] CPU: 2 PID: 2270 Comm: rmmod Tainted: G           O    4.4.11-v7+ #888
May 17 12:50:12 Raspi2B-A kernel: [  221.915232] Hardware name: BCM2709
May 17 12:50:12 Raspi2B-A kernel: [  221.916252] task: b5016780 ti: b7c88000 task.ti: b7c88000
May 17 12:50:12 Raspi2B-A kernel: [  221.917317] PC is at _raw_spin_lock_irq+0x24/0x60
May 17 12:50:12 Raspi2B-A kernel: [  221.918379] LR is at _raw_spin_lock_irq+0x20/0x60
May 17 12:50:12 Raspi2B-A kernel: [  221.919413] pc : [<805b889c>]    lr : [<805b8898>]    psr: 60000093
May 17 12:50:12 Raspi2B-A kernel: [  221.919413] sp : b7c89dd8  ip : b7c89dd8  fp : b7c89dec
May 17 12:50:12 Raspi2B-A kernel: [  221.921539] r10: b7e1c1a0  r9 : 00000000  r8 : 88888889
May 17 12:50:12 Raspi2B-A kernel: [  221.922614] r7 : b7e1c430  r6 : 00000002  r5 : 00000003  r4 : 00000000
May 17 12:50:12 Raspi2B-A kernel: [  221.923710] r3 : 808614f8  r2 : 00000000  r1 : 00000002  r0 : 00000000
May 17 12:50:12 Raspi2B-A kernel: [  221.924803] Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
May 17 12:50:12 Raspi2B-A kernel: [  221.925932] Control: 10c5387d  Table: 37d4006a  DAC: 00000055
May 17 12:50:12 Raspi2B-A kernel: [  221.927032] Process rmmod (pid: 2270, stack limit = 0xb7c88210)
May 17 12:50:12 Raspi2B-A kernel: [  221.928110] Stack: (0xb7c89dd8 to 0xb7c8a000)
May 17 12:50:12 Raspi2B-A kernel: [  221.929196] 9dc0:                                                       00000000 b7e1c400
May 17 12:50:12 Raspi2B-A kernel: [  221.931370] 9de0: b7c89e2c b7c89df0 8003a708 805b8884 80956118 808c3bf6 00000000 00000008
May 17 12:50:12 Raspi2B-A kernel: [  221.933552] 9e00: 00000081 b7e1c400 b7c89e58 b7e1c440 b7e1c410 00000003 b7c89e4c 00000000
May 17 12:50:12 Raspi2B-A kernel: [  221.935747] 9e20: b7c89e94 b7c89e30 8003aa10 8003a694 803a4cb8 803a4c30 8070054c 00000000
May 17 12:50:12 Raspi2B-A kernel: [  221.938010] 9e40: 00000000 b7c89e60 b7c89e60 b7c89e4c b7c89e4c 00000002 00000000 00000000
May 17 12:50:12 Raspi2B-A kernel: [  221.940398] 9e60: b7c89e60 b7c89e60 806fd244 b6b25380 b923cc00 b6b25380 00000081 8000fd08
May 17 12:50:12 Raspi2B-A kernel: [  221.942960] 9e80: b7c88000 00000000 b7c89ebc b7c89e98 7f1b09a8 8003a8d4 805b8a3c 800d959c
May 17 12:50:12 Raspi2B-A kernel: [  221.945630] 9ea0: 00000006 b923cc00 7f1b2fac b923cc34 b7c89ed4 b7c89ec0 803fce3c 7f1b0928
May 17 12:50:12 Raspi2B-A kernel: [  221.948454] 9ec0: b923cc00 7f1b2fac b7c89eec b7c89ed8 803a8b44 803fce1c b923cc00 7f1b2fac
May 17 12:50:12 Raspi2B-A kernel: [  221.951363] 9ee0: b7c89f0c b7c89ef0 803a8cf4 803a8abc b5016780 7f1b2fac 551cefb4 7e9f792e
May 17 12:50:12 Raspi2B-A kernel: [  221.954354] 9f00: b7c89f24 b7c89f10 803a7f30 803a8c24 7f1b1a20 7f1b2fac b7c89f3c b7c89f28
May 17 12:50:12 Raspi2B-A kernel: [  221.957476] 9f20: 803a9150 803a7ee0 7f1b1a20 7f1b3000 b7c89f4c b7c89f40 7f1b1a34 803a9124
May 17 12:50:12 Raspi2B-A kernel: [  221.960643] 9f40: b7c89fa4 b7c89f50 8009d2b0 7f1b1a2c b7c89f6c 32386163 6d733031 00000000
May 17 12:50:12 Raspi2B-A kernel: [  221.963810] 9f60: b7c89f8c b7c88000 b7c88010 8000fd08 b7c89fb0 551cef78 00000000 7e9f792e
May 17 12:50:12 Raspi2B-A kernel: [  221.966978] 9f80: 00000081 20000030 00000000 00c89f98 551cef78 00000000 00000000 b7c89fa8
May 17 12:50:12 Raspi2B-A kernel: [  221.970148] 9fa0: 8000fb40 8009d138 551cef78 00000000 551cefb4 00000800 a29fca00 a29fca00
May 17 12:50:12 Raspi2B-A kernel: [  221.973317] 9fc0: 551cef78 00000000 7e9f792e 00000081 7e9f7638 00000001 7e9f782c 00000000
May 17 12:50:12 Raspi2B-A kernel: [  221.976487] 9fe0: 54b66f6c 7e9f75d4 54b4c303 76f5f8a6 20000030 551cefb4 00000000 00000000
May 17 12:50:12 Raspi2B-A kernel: [  221.979676] [<805b889c>] (_raw_spin_lock_irq) from [<8003a708>] (flush_workqueue_prep_pwqs+0x80/0x240)
May 17 12:50:12 Raspi2B-A kernel: [  221.982862] [<8003a708>] (flush_workqueue_prep_pwqs) from [<8003aa10>] (flush_workqueue+0x148/0x5c8)
May 17 12:50:12 Raspi2B-A kernel: [  221.986048] [<8003aa10>] (flush_workqueue) from [<7f1b09a8>] (ca8210_remove+0x8c/0x17c [ca8210sm])
May 17 12:50:12 Raspi2B-A kernel: [  221.989234] [<7f1b09a8>] (ca8210_remove [ca8210sm]) from [<803fce3c>] (spi_drv_remove+0x2c/0x44)
May 17 12:50:12 Raspi2B-A kernel: [  221.992405] [<803fce3c>] (spi_drv_remove) from [<803a8b44>] (__device_release_driver+0x94/0x110)
May 17 12:50:12 Raspi2B-A kernel: [  221.995569] [<803a8b44>] (__device_release_driver) from [<803a8cf4>] (driver_detach+0xdc/0xe0)
May 17 12:50:12 Raspi2B-A kernel: [  221.998725] [<803a8cf4>] (driver_detach) from [<803a7f30>] (bus_remove_driver+0x5c/0xb0)
May 17 12:50:12 Raspi2B-A kernel: [  222.001869] [<803a7f30>] (bus_remove_driver) from [<803a9150>] (driver_unregister+0x38/0x58)
May 17 12:50:12 Raspi2B-A kernel: [  222.005016] [<803a9150>] (driver_unregister) from [<7f1b1a34>] (ca8210_spi_driver_exit+0x14/0x1c [ca8210sm])
May 17 12:50:12 Raspi2B-A kernel: [  222.008179] [<7f1b1a34>] (ca8210_spi_driver_exit [ca8210sm]) from [<8009d2b0>] (SyS_delete_module+0x184/0x1c8)
May 17 12:50:12 Raspi2B-A kernel: [  222.011335] [<8009d2b0>] (SyS_delete_module) from [<8000fb40>] (ret_fast_syscall+0x0/0x1c)
May 17 12:50:12 Raspi2B-A kernel: [  222.014465] Code: e1a04000 f10c0080 ebec8388 f594f000 (e1943f9f)
May 17 12:50:12 Raspi2B-A kernel: [  222.016048] ---[ end trace a1324d7c3f019878 ]---
CiaranWoodward commented 7 years ago

After a modification to prevent the shutting down of the driver upon the "too many retries!", the driver still becomes unresponsive after the error. Furthermore, the radio does not acknowledge any incoming frames, and the chip appears unresponsive.

CiaranWoodward commented 7 years ago

Potentially related is the repeated NACKing for the IRQ receive:

[ 7265.953383] ca8210 spi0.0: retried spi write
[ 7265.954759] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7265.973335] ca8210 spi0.0: retried spi write
[ 7265.974690] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7265.993335] ca8210 spi0.0: retried spi write
[ 7266.006893] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7266.023352] ca8210 spi0.0: retried spi write
[ 7269.002268] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7269.002292] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7269.018349] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7269.018364] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7269.036682] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7269.036697] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7269.061907] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7269.061921] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7269.099989] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7269.100003] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7269.124940] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7269.124954] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7269.168631] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7269.168645] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7269.193488] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7269.193502] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7269.801988] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7269.802003] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7270.658351] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7270.658374] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7271.036737] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7271.053355] ca8210 spi0.0: retried spi write
[ 7271.054041] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7271.083359] ca8210 spi0.0: retried spi write
[ 7271.084041] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7271.123360] ca8210 spi0.0: retried spi write
[ 7271.124043] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7271.173398] ca8210 spi0.0: retried spi write
[ 7271.174083] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7271.174100] ca8210 spi0.0: too many retries!
[ 7271.443242] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7271.443259] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7272.193868] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7272.193883] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7272.999612] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7272.999630] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7273.913557] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7273.913575] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7273.929356] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7273.929370] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7274.553595] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7274.553614] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7275.491626] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7275.491643] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7275.963377] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7275.963392] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7276.243669] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7276.243687] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7276.713523] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7276.713538] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7277.569844] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7277.569860] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7278.322741] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7278.322758] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7278.678308] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7278.678323] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7279.426954] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7279.426972] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7279.999780] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7279.999798] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7280.896271] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7280.896288] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7280.912069] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7280.912085] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7281.487444] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7281.487463] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7282.490267] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7282.490286] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7283.404944] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7283.404963] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7283.420744] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7283.420759] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7284.154093] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7284.154109] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7285.707971] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7285.707991] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7286.782928] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7286.782949] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7286.798730] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7286.798746] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7288.092557] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7288.092575] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer
[ 7288.340990] ca8210 spi0.0: ca8210 was busy during attempted write
[ 7288.341006] ca8210 spi0.0: IRQ servicing NACKd, dropping transfer