espressif / esp-hosted

Hosted Solution (Linux/MCU) with ESP32 (Wi-Fi + BT + BLE)
Other
675 stars 158 forks source link

scheduling while atomic in esp_hosted_ng kernel module #335

Closed stwirth closed 3 months ago

stwirth commented 7 months ago

I'm getting a kernel error in the esp hosted NG kernel module from time to time (~once every 10 minutes or so)

Feb 12 10:35:19 d user.err kernel: [161893.493253] BUG: scheduling while atomic: kworker/0:2/20275/0x00000002
Feb 12 10:35:19 d user.warn kernel: [161893.500172] Modules linked in: chsc6540_touch esp32_hosted_ng
Feb 12 10:35:19 d user.warn kernel: [161893.506266] CPU: 0 PID: 20275 Comm: kworker/0:2 Tainted: G        W         5.10.21 #1
Feb 12 10:35:19 d user.warn kernel: [161893.514566] Workqueue: ESP_SPI_WORK_QUEUE esp_spi_work [esp32_hosted_ng]
Feb 12 10:35:19 d user.warn kernel: [161893.521608] Stack : 80be5cac 80098a60 00000000 00000000 80aea820 fffffff8 82f539fc 80098a60
Feb 12 10:35:19 d user.warn kernel: [161893.530381]         80bc0000 845ff180 80adcf2c 80ae0000 00000001 00000001 82f539a8 1c7c3e0f
Feb 12 10:35:19 d user.warn kernel: [161893.539145]         00000000 00000000 80adcf2c 82f53848 000022ae 82f5385c 00000000 205d3636
Feb 12 10:35:19 d user.warn kernel: [161893.547898]         1eaba889 82f5385c ffffffff 00000030 80bc0000 00000000 00000000 80ae0000
Feb 12 10:35:19 d user.warn kernel: [161893.556657]         00000001 82f53b4c 00000000 fffffff8 00000000 00000000 000eeb9f 00000001
Feb 12 10:35:19 d user.warn kernel: [161893.565420]         ...
Feb 12 10:35:19 d user.warn kernel: [161893.568056] Call Trace:
Feb 12 10:35:19 d user.warn kernel: [161893.570691] [<8001f178>] show_stack+0x94/0x12c
Feb 12 10:35:19 d user.warn kernel: [161893.575402] [<80957488>] dump_stack+0xac/0xe8
Feb 12 10:35:19 d user.warn kernel: [161893.580016] [<800686e4>] __schedule_bug+0x5c/0x7c
Feb 12 10:35:19 d user.warn kernel: [161893.584989] [<8095ec50>] __schedule+0x8d0/0xaec
Feb 12 10:35:19 d user.warn kernel: [161893.589778] [<8095eee4>] schedule+0x78/0x12c
Feb 12 10:35:19 d user.warn kernel: [161893.594291] [<8095f3b0>] schedule_preempt_disabled+0x18/0x30
Feb 12 10:35:19 d user.warn kernel: [161893.600247] [<809609b4>] __mutex_lock.isra.5+0x144/0x460
Feb 12 10:35:19 d user.warn kernel: [161893.605850] [<8041e628>] clk_prepare_lock+0x3c/0xfc
Feb 12 10:35:19 d user.warn kernel: [161893.611008] [<8042191c>] clk_set_rate+0x24/0xd4
Feb 12 10:35:19 d user.warn kernel: [161893.615806] [<80950028>] ingenic_spi_setupxfer+0x15c/0x338
Feb 12 10:35:19 d user.warn kernel: [161893.621598] [<804f1f28>] spi_bitbang_transfer_one+0x3c/0xf4
Feb 12 10:35:19 d user.warn kernel: [161893.627463] [<804ef940>] spi_transfer_one_message+0x360/0x5f0
Feb 12 10:35:19 d user.warn kernel: [161893.633513] [<804efe04>] __spi_pump_messages+0x234/0x60c
Feb 12 10:35:19 d user.warn kernel: [161893.639122] [<804f0414>] __spi_sync+0x22c/0x260
Feb 12 10:35:19 d user.warn kernel: [161893.643909] [<804f0474>] spi_sync+0x2c/0x50
Feb 12 10:35:19 d user.warn kernel: [161893.648349] [<c0e39910>] spi_sync_transfer.constprop.3+0x7c/0xb4 [esp32_hosted_ng]
Feb 12 10:35:19 d user.warn kernel: [161893.656295] [<c0e39b34>] esp_spi_work+0x1ec/0x468 [esp32_hosted_ng]
Feb 12 10:35:19 d user.warn kernel: [161893.662908] [<80059414>] process_one_work+0x22c/0x448
Feb 12 10:35:19 d user.warn kernel: [161893.668238] [<80059798>] worker_thread+0x168/0x5c8
Feb 12 10:35:19 d user.warn kernel: [161893.673297] [<800621fc>] kthread+0x16c/0x174
Feb 12 10:35:19 d user.warn kernel: [161893.677829] [<80017e8c>] ret_from_kernel_thread+0x14/0x1c

Any ideas on how to debug this?

mantriyogesh commented 7 months ago

What is:

  1. Git commit at host and slave
  2. Linux kernel version
  3. Full log from boot at host and slave?
stwirth commented 7 months ago
  1. I just updated the kernel module and the ESP32 firmware to the latest commit: cdd6cb66295e879f83096eaece20839ef831bcfb
  2. 5.10.21

ESP32 logs on kernel module loading

rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 188777542, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:7088
load:0x40078000,len:15584
load:0x40080400,len:4
load:0x40080404,len:3876
entry 0x4008064c
I (30) boot: ESP-IDF d3c99ed3b8-dirty 2nd stage bootloader
I (30) boot: compile time Feb  1 2024 14:02:28
I (31) boot: Multicore bootloader
I (35) boot: chip revision: v3.0
I (39) boot.esp32: SPI Speed      : 40MHz
I (44) boot.esp32: SPI Mode       : DIO
I (48) boot.esp32: SPI Flash Size : 4MB
I (53) boot: Enabling RNG early entropy source...
I (58) boot: Partition Table:
I (62) boot: ## Label            Usage          Type ST Offset   Length
I (69) boot:  0 nvs              WiFi data        01 02 00009000 00004000
I (76) boot:  1 otadata          OTA data         01 00 0000d000 00002000
I (84) boot:  2 phy_init         RF data          01 01 0000f000 00001000
I (91) boot:  3 factory          factory app      00 00 00010000 00100000
I (99) boot:  4 ota_0            OTA app          00 10 00110000 00100000
I (106) boot:  5 ota_1            OTA app          00 11 00210000 00100000
I (114) boot: End of partition table
I (118) boot: Defaulting to factory image
I (123) esp_image: segment 0: paddr=00010020 vaddr=3f400020 size=20478h (132216) map
I (179) esp_image: segment 1: paddr=000304a0 vaddr=3ffbdb60 size=05308h ( 21256) load
I (188) esp_image: segment 2: paddr=000357b0 vaddr=40080000 size=0a868h ( 43112) load
I (206) esp_image: segment 3: paddr=00040020 vaddr=400d0020 size=8c308h (574216) map
I (413) esp_image: segment 4: paddr=000cc330 vaddr=4008a868 size=13f04h ( 81668) load
I (463) boot: Loaded app from partition at offset 0x10000
I (463) boot: Disabling RNG early entropy source...
I (475) cpu_start: Multicore app
I (475) cpu_start: Pro cpu up.
I (475) cpu_start: Starting app cpu, entry point is 0x4008142c
I (0) cpu_start: App cpu up.
I (493) cpu_start: Pro cpu start user code
I (493) cpu_start: cpu freq: 240000000 Hz
I (493) cpu_start: Application information:
I (498) cpu_start: Project name:     network_adapter
I (503) cpu_start: App version:      release/ng-v1.0.2-203-gd6f077c-
I (510) cpu_start: Compile time:     Feb 13 2024 09:48:34
I (516) cpu_start: ELF file SHA256:  0193c355e34d7399...
I (522) cpu_start: ESP-IDF:          d3c99ed3b8-dirty
I (528) cpu_start: Min chip rev:     v0.0
I (533) cpu_start: Max chip rev:     v3.99 
I (538) cpu_start: Chip rev:         v3.0
I (543) heap_init: Initializing. RAM available for dynamic allocation:
I (550) heap_init: At 3FFAFF10 len 000000F0 (0 KiB): DRAM
I (556) heap_init: At 3FFB6388 len 00001C78 (7 KiB): DRAM
I (562) heap_init: At 3FFB9A20 len 00004108 (16 KiB): DRAM
I (568) heap_init: At 3FFC7760 len 000188A0 (98 KiB): DRAM
I (574) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (580) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (587) heap_init: At 4009E76C len 00001894 (6 KiB): IRAM
I (594) spi_flash: detected chip: generic
I (598) spi_flash: flash io: dio
I (602) coexist: coex firmware version: 80b0d89
I (607) app_start: Starting scheduler on CPU0
I (612) app_start: Starting scheduler on CPU1
mI (612) main_task: Started on CPU0
I (620) main_task: Calling app_main()
I (622) stats: *********************************************************************
I (628) stats:                 ESP-Hosted Firmware version :: 1.0.3                        
I (636) stats:                 Transport used :: SPI only                      
I (642) stats: *********************************************************************
I (648) FW_MAIN: Supported features are:
I (652) FW_MAIN: - WLAN over SPI
I (656) FW_BT: - BT/BLE
I (658) FW_BT:    - HCI Over SPI
I (662) FW_BT:    - BT/BLE dual mode
I (664) FW_MAIN: Capabilities: 0xf8
I (678) wifi:wifi driver task: 3ffbd5f0, prio:23, stack:6656, core=1
I (680) wifi:wifi firmware version: fddc5e5
I (680) wifi:wifi certification version: v7.0
I (682) wifi:config NVS flash: disabled
I (684) wifi:config nano formating: disabled
I (688) wifi:Init data frame dynamic rx buffer num: 32
I (692) wifi:Init management frame dynamic rx buffer num: 32
I (696) wifi:Init management short buffer num: 32
I (698) wifi:Init dynamic tx buffer num: 64
I (702) wifi:Init static rx buffer size: 1600
I (706) wifi:Init static rx buffer num: 16
I (708) wifi:Init dynamic rx buffer num: 32
I (712) wifi_init: rx ba win: 6
I (714) wifi_init: tcpip mbox: 32
I (718) wifi_init: udp mbox: 6
I (720) wifi_init: tcp mbox: 6
I (724) wifi_init: tcp tx win: 5744
I (726) wifi_init: tcp rx win: 5744
I (730) wifi_init: tcp mss: 1440
I (734) wifi_init: WiFi IRAM OP enabled
I (738) wifi_init: WiFi RX IRAM OP enabled
I (742) BTDM_INIT: BT controller compile version [946b762]
I (746) BTDM_INIT: Bluetooth MAC: 44:17:93:d4:15:b2
I (752) phy_init: phy_version 4670,719f9f6,Feb 18 2021,17:07:07
I (1062) FW_MAIN: ESP Bluetooth MAC addr: 44-17-93-d4-15-b2

I (1064) FW_SPI: Using SPI interface
I (1064) gpio: GPIO[22]| InputEn: 0| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 0| Intr:0 
I (1070) gpio: GPIO[21]| InputEn: 0| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 0| Intr:0 
I (2080) FW_MAIN: Initial set up done
I (2080) main_task: Returned from app_main()
I (2170) FW_MAIN: INIT Interface command

I (2172) wifi:mode : sta (44:17:93:d4:15:b0)
I (2174) wifi:enable tsf
I (2174) FW_CMD: Wifi Sta mode set

I (2210) FW_MAIN: Get MAC command

I (2370) FW_MAIN: Scan request

I (3778) FW_MAIN: Auth request

I (3878) FW_CMD: Connecting to *******, channel: 6 [3]
I (3878) FW_CMD: AUTH Commit

E (3892) wifi:null wpa_sta_ap_set_rsnxe
I (3892) wifi:new:<6,0>, old:<1,0>, ap:<255,255>, sta:<6,0>, prof:1
I (3894) wifi:state: init -> auth (b0)
I (3904) FW_CMD: handle_wpa_sta_rx_mgmt:635 Auth[11] recvd

I (3946) FW_MAIN: Assoc request

I (3946) wifi:state: auth -> assoc (0)
I (3960) wifi:state: assoc -> run (10)
I (3962) FW_CMD: handle_wpa_sta_rx_mgmt:643 ASSOC Resp[1] recvd

I (3962) FW_CMD: STA connect event [channel 6]

I (3964) BSSID: 0x3ffd9cee   f4 92 bf 17 84 90                        I (4066) FW_MAIN: Add key request

I (4066) FW_CMD: process_add_key:1784

I (4098) FW_MAIN: Set default key request

I (4130) FW_MAIN: Add key request

I (4130) FW_CMD: process_add_key:1784

I (4130) FW_CMD: Setting GTK [2]

I (4130) FW_CMD: process_add_key:1849 auth done

I (4130) wifi:connected with ******, aid = 2, channel 6, BW20, bssid = f4:92:bf:17:84:90
I (4142) wifi:security: WPA2-PSK, phy: bgn, rssi: -81
I (4146) wifi:pm start, type: 1

I (4148) FW_CMD: STA connected with:f4:92:bf:17:84:90

I (4152) FW_CMD: Wifi Station Connected event!!

I (4246) wifi:AP's beacon interval = 102400 us, DTIM period = 3

Kernel logs on module loading

Feb 13 13:33:28 d user.info kernel: [ 1035.373145] esp32_hosted_ng:spi_dev_init: ESP32 peripheral is registered to SPI bus [0],chip select [0], SPI Clock [10]
Feb 13 13:33:28 d user.warn kernel: [ 1035.384536] IRQ Error, cpu: 1 Cause:0x08800000, Status:0x14001c00
Feb 13 13:33:28 d user.info kernel: [ 1035.608477] esp_init: Initialization done
Feb 13 13:33:30 d user.info kernel: [ 1037.600430] esp32_hosted_ng:process_esp_bootup_event: Received ESP bootup event
Feb 13 13:33:30 d user.info kernel: [ 1037.608035] esp32_hosted_ng:process_event_esp_bootup: Bootup Event tag: 3
Feb 13 13:33:30 d user.info kernel: [ 1037.615895] esp32_hosted_ng:esp_validate_chipset: Chipset=ESP32 ID=00 detected over SPI
Feb 13 13:33:30 d user.info kernel: [ 1037.624267] esp32_hosted_ng:process_event_esp_bootup: Bootup Event tag: 2
Feb 13 13:33:30 d user.info kernel: [ 1037.631508] esp32_hosted_ng:process_event_esp_bootup: Bootup Event tag: 0
Feb 13 13:33:30 d user.info kernel: [ 1037.638883] esp32_hosted_ng:process_event_esp_bootup: Bootup Event tag: 1
Feb 13 13:33:30 d user.info kernel: [ 1037.645920] esp32_hosted_ng:process_fw_data: ESP chipset's last reset cause:
Feb 13 13:33:30 d user.info kernel: [ 1037.653243] esp32_hosted_ng:print_reset_reason: POWERON_RESET
Feb 13 13:33:30 d user.info kernel: [ 1037.659224] esp32_hosted_ng:check_esp_version: ESP Firmware version: 1.0.3
Feb 13 13:33:30 d user.info kernel: [ 1037.666548] esp32_hosted_ng:esp_reg_notifier: Driver init is ongoing
Feb 13 13:33:31 d user.info kernel: [ 1037.978390] esp32_hosted_ng:init_bt: ESP Bluetooth init
Feb 13 13:33:31 d user.info kernel: [ 1037.984058] esp32_hosted_ng:print_capabilities: Capabilities: 0xf8. Features supported are:
Feb 13 13:33:31 d user.info kernel: [ 1037.993965] esp32_hosted_ng:print_capabilities:   * WLAN on SPI
Feb 13 13:33:31 d user.info kernel: [ 1038.000318] esp32_hosted_ng:print_capabilities:   * BT/BLE
Feb 13 13:33:31 d user.info kernel: [ 1038.006396] esp32_hosted_ng:print_capabilities:     - HCI over SPI
Feb 13 13:33:31 d user.info kernel: [ 1038.012888] esp32_hosted_ng:print_capabilities:     - BT/BLE dual mode
Feb 13 13:33:31 d daemon.info bluetoothd[1060]: Endpoint registered: sender=:1.2 path=/org/bluez/hci0/A2DP/MPEG12/Sink/1
Feb 13 13:33:31 d daemon.info bluetoothd[1060]: Endpoint registered: sender=:1.2 path=/org/bluez/hci0/A2DP/MPEG12/Sink/2
Feb 13 13:33:31 d daemon.info bluetoothd[1060]: Endpoint registered: sender=:1.2 path=/org/bluez/hci0/A2DP/SBC/Sink/1
Feb 13 13:33:31 d daemon.info bluetoothd[1060]: Endpoint registered: sender=:1.2 path=/org/bluez/hci0/A2DP/SBC/Sink/2
Feb 13 13:33:32 d user.info kernel: [ 1039.667975] esp32_hosted_ng:cmd_auth_request: Authentication request: f4:92:bf:17:84:90 6 0 0 0
Feb 13 13:33:32 d user.info kernel: [ 1039.887193] esp32_hosted_ng:cmd_assoc_request: Association request: f4:92:bf:17:84:90 6 27
Feb 13 13:33:33 d user.info kernel: [ 1039.969056] esp32_hosted_ng:process_assoc_event: Connection status: 0
Feb 13 13:33:33 d user.info kernel: [ 1039.977347] esp32_hosted_ng:process_rx_packet: Rx PACKET_TYPE_EAPOL!!!!
Feb 13 13:33:33 d user.warn kernel: [ 1039.984582] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!!
Feb 13 13:33:33 d user.info kernel: [ 1040.022927] esp32_hosted_ng:process_rx_packet: Rx PACKET_TYPE_EAPOL!!!!
Feb 13 13:33:33 d user.warn kernel: [ 1040.029861] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!!

Then, after about 12 minutes, I see the error:

Feb 13 13:45:49 d user.err kernel: [ 1776.157721] BUG: scheduling while atomic: kworker/0:2/2407/0x00000002
Feb 13 13:45:49 d user.warn kernel: [ 1776.164457] Modules linked in: esp32_hosted_ng chsc6540_touch [last unloaded: esp32_hosted_ng]
Feb 13 13:45:49 d user.warn kernel: [ 1776.173458] CPU: 0 PID: 2407 Comm: kworker/0:2 Tainted: G        W         5.10.21 #1
Feb 13 13:45:49 d user.warn kernel: [ 1776.181577] Workqueue: ESP_SPI_WORK_QUEUE esp_spi_work [esp32_hosted_ng]
Feb 13 13:45:49 d user.warn kernel: [ 1776.188516] Stack : 80be5cac 80098a60 00000000 00000000 80aea820 fffffff8 84a899fc 80098a60
Feb 13 13:45:49 d user.warn kernel: [ 1776.197172]         80bc0000 82e12780 80adcf2c 80ae0000 00000001 00000001 84a899a8 26f2fd0d
Feb 13 13:45:49 d user.warn kernel: [ 1776.205822]         00000000 00000000 80adcf2c 84a89848 0000024b 84a8985c 00000000 00000708
Feb 13 13:45:49 d user.warn kernel: [ 1776.214469]         0ad2a826 84a8985b ffffffff 00000030 80bc0000 00000000 00000000 80ae0000
Feb 13 13:45:49 d user.warn kernel: [ 1776.223119]         00000001 84a89b4c 00000000 fffffff8 00000000 00000000 000ed29f 00000001
Feb 13 13:45:49 d user.warn kernel: [ 1776.231766]         ...
Feb 13 13:45:49 d user.warn kernel: [ 1776.234293] Call Trace:
Feb 13 13:45:49 d user.warn kernel: [ 1776.236823] [<8001f178>] show_stack+0x94/0x12c
Feb 13 13:45:49 d user.warn kernel: [ 1776.241415] [<80957488>] dump_stack+0xac/0xe8
Feb 13 13:45:49 d user.warn kernel: [ 1776.245918] [<800686e4>] __schedule_bug+0x5c/0x7c
Feb 13 13:45:49 d user.warn kernel: [ 1776.250777] [<8095ec50>] __schedule+0x8d0/0xaec
Feb 13 13:45:49 d user.warn kernel: [ 1776.255460] [<8095eee4>] schedule+0x78/0x12c
Feb 13 13:45:49 d user.warn kernel: [ 1776.259871] [<8095f3b0>] schedule_preempt_disabled+0x18/0x30
Feb 13 13:45:49 d user.warn kernel: [ 1776.265713] [<809609b4>] __mutex_lock.isra.5+0x144/0x460
Feb 13 13:45:49 d user.warn kernel: [ 1776.271200] [<8041e628>] clk_prepare_lock+0x3c/0xfc
Feb 13 13:45:49 d user.warn kernel: [ 1776.276239] [<8042191c>] clk_set_rate+0x24/0xd4
Feb 13 13:45:49 d user.warn kernel: [ 1776.280918] [<80950028>] ingenic_spi_setupxfer+0x15c/0x338
Feb 13 13:45:49 d user.warn kernel: [ 1776.286585] [<804f1f28>] spi_bitbang_transfer_one+0x3c/0xf4
Feb 13 13:45:49 d user.warn kernel: [ 1776.292340] [<804ef940>] spi_transfer_one_message+0x360/0x5f0
Feb 13 13:45:49 d user.warn kernel: [ 1776.298275] [<804efe04>] __spi_pump_messages+0x234/0x60c
Feb 13 13:45:49 d user.warn kernel: [ 1776.303764] [<804f0414>] __spi_sync+0x22c/0x260
Feb 13 13:45:49 d user.warn kernel: [ 1776.308445] [<804f0474>] spi_sync+0x2c/0x50
Feb 13 13:45:49 d user.warn kernel: [ 1776.312775] [<c004b910>] spi_sync_transfer.constprop.3+0x7c/0xb4 [esp32_hosted_ng]
Feb 13 13:45:49 d user.warn kernel: [ 1776.320599] [<c004bb34>] esp_spi_work+0x1ec/0x468 [esp32_hosted_ng]
Feb 13 13:45:49 d user.warn kernel: [ 1776.327079] [<80059414>] process_one_work+0x22c/0x448
Feb 13 13:45:49 d user.warn kernel: [ 1776.332300] [<80059798>] worker_thread+0x168/0x5c8
Feb 13 13:45:49 d user.warn kernel: [ 1776.337245] [<800621fc>] kthread+0x16c/0x174
Feb 13 13:45:49 d user.warn kernel: [ 1776.341659] [<80017e8c>] ret_from_kernel_thread+0x14/0x1c
Feb 13 13:45:49 d user.warn kernel: [ 1776.347234] 
Feb 13 13:45:49 d user.err kernel: [ 1776.351296] BUG: scheduling while atomic: kworker/0:2/2407/0x00000000
Feb 13 13:45:49 d user.warn kernel: [ 1776.358038] Modules linked in: esp32_hosted_ng chsc6540_touch [last unloaded: esp32_hosted_ng]
Feb 13 13:45:49 d user.warn kernel: [ 1776.367629] CPU: 0 PID: 2407 Comm: kworker/0:2 Tainted: G        W         5.10.21 #1
Feb 13 13:45:49 d user.warn kernel: [ 1776.375744] Workqueue: ESP_SPI_WORK_QUEUE esp_spi_work [esp32_hosted_ng]
Feb 13 13:45:49 d user.warn kernel: [ 1776.382677] Stack : 00000000 80098a60 00000000 00000000 80aea820 81513000 84a89a14 80098a60
Feb 13 13:45:49 d user.warn kernel: [ 1776.391330]         80bc0000 82e12780 80adcf2c 80ae0000 00000001 00000001 84a899c0 26f2fd0d
Feb 13 13:45:49 d user.warn kernel: [ 1776.399980]         00000000 00000000 80adcf2c 84a89860 0000026c 84a89874 00000000 00000e74
Feb 13 13:45:49 d user.warn kernel: [ 1776.408635]         166569a3 84a89873 ffffffff 00000030 80bc0000 00000000 00000000 80ae0000
Feb 13 13:45:49 d user.warn kernel: [ 1776.417287]         00000001 840f9000 84a89de4 81513000 00000000 00000000 000f47df 000cc69f
Feb 13 13:45:49 d user.warn kernel: [ 1776.425931]         ...
Feb 13 13:45:49 d user.warn kernel: [ 1776.428457] Call Trace:
Feb 13 13:45:49 d user.warn kernel: [ 1776.430983] [<8001f178>] show_stack+0x94/0x12c
Feb 13 13:45:49 d user.warn kernel: [ 1776.435575] [<80957488>] dump_stack+0xac/0xe8
Feb 13 13:45:49 d user.warn kernel: [ 1776.440080] [<800686e4>] __schedule_bug+0x5c/0x7c
Feb 13 13:45:49 d user.warn kernel: [ 1776.444949] [<8095ec50>] __schedule+0x8d0/0xaec
Feb 13 13:45:49 d user.warn kernel: [ 1776.449630] [<8095eee4>] schedule+0x78/0x12c
Feb 13 13:45:49 d user.warn kernel: [ 1776.454042] [<80962cac>] schedule_timeout+0x1e4/0x314
Feb 13 13:45:49 d user.warn kernel: [ 1776.459259] [<80960304>] wait_for_completion_interruptible+0x8c/0x16c
Feb 13 13:45:49 d user.warn kernel: [ 1776.465914] [<8094f198>] ingenic_spi_pio_txrx+0x278/0x4f4
Feb 13 13:45:49 d user.warn kernel: [ 1776.471490] [<8094d7dc>] ingenic_spi_txrx+0x64/0xdc
Feb 13 13:45:49 d user.warn kernel: [ 1776.476531] [<804f1f4c>] spi_bitbang_transfer_one+0x60/0xf4
Feb 13 13:45:49 d user.warn kernel: [ 1776.482286] [<804ef940>] spi_transfer_one_message+0x360/0x5f0
Feb 13 13:45:49 d user.warn kernel: [ 1776.488225] [<804efe04>] __spi_pump_messages+0x234/0x60c
Feb 13 13:45:49 d user.warn kernel: [ 1776.493714] [<804f0414>] __spi_sync+0x22c/0x260
Feb 13 13:45:49 d user.warn kernel: [ 1776.498396] [<804f0474>] spi_sync+0x2c/0x50
Feb 13 13:45:49 d user.warn kernel: [ 1776.502723] [<c004b910>] spi_sync_transfer.constprop.3+0x7c/0xb4 [esp32_hosted_ng]
Feb 13 13:45:49 d user.warn kernel: [ 1776.510550] [<c004bb34>] esp_spi_work+0x1ec/0x468 [esp32_hosted_ng]
Feb 13 13:45:49 d user.warn kernel: [ 1776.517033] [<80059414>] process_one_work+0x22c/0x448
Feb 13 13:45:49 d user.warn kernel: [ 1776.522251] [<80059798>] worker_thread+0x168/0x5c8
Feb 13 13:45:49 d user.warn kernel: [ 1776.527200] [<800621fc>] kthread+0x16c/0x174
Feb 13 13:45:49 d user.warn kernel: [ 1776.531609] [<80017e8c>] ret_from_kernel_thread+0x14/0x1c
Feb 13 13:45:49 d user.warn kernel: [ 1776.537184] 

During this test, the interface did not have an IP address, there was no WiFi traffic. There is no output on the ESP32 console while this error is happening.

mantriyogesh commented 7 months ago

Hello @stwirth ,

Thank you for reporting this issue and the logs. We are looking into this issue.

By the time, to unblock, can you please use release/ng-1.0.2?

mantriyogesh commented 7 months ago

@kapilkedawat

stwirth commented 7 months ago

Thanks @mantriyogesh.

I'm actually not blocked by this. The error is thrown irregularly and the system recovers from it fine. When there is a WiFi connection active it still works after the error is thrown.

kapilkedawat commented 7 months ago

Hi @stwirth, could you please share more details about the host? Is there any other SPI transaction happening?

stwirth commented 7 months ago

@kapilkedawat no, the esp kernel module is the only thing using the bus. Unfortunately I cannot give much more details about the host but I can try to help debugging if you tell me what to do.

kapilkedawat commented 6 months ago

@stwirth could you pls retry with latest host code?

stwirth commented 6 months ago

Hi @kapilkedawat, I updated the host code to 573600b5758d452768355e3370b8ffe14b2cb8f4 and I still get the error. It seems the only changes since the last time I updated the host code were around logging: https://github.com/espressif/esp-hosted/compare/cdd6cb6..573600b#diff-d9bfdf39da7987d00b8ccc8d9f8aa3596dcb166b9b3f23310a135f3a23b48448

kapilkedawat commented 6 months ago

@stwirth the changes also contain some sync issues fixes.

do you get the stack even when wpa_supplicant is not running?

stwirth commented 6 months ago

@kapilkedawat I did an experiment over a day, running two devices in parallel, one with wpa supplicant and one without. I don't get the stack trace on the device that has no wpa supplicant running. On the one with wpa supplicant, I get 4-7 events in 10 hours. It seems they come in bursts:

root@d ~ # dmesg | grep ESP_SPI_WORK_QUEUE
[ 8799.186430] Workqueue: ESP_SPI_WORK_QUEUE esp_spi_work [esp32_hosted_ng]
[36546.350909] Workqueue: ESP_SPI_WORK_QUEUE esp_spi_work [esp32_hosted_ng]
[36546.553210] Workqueue: ESP_SPI_WORK_QUEUE esp_spi_work [esp32_hosted_ng]
[36946.187352] Workqueue: ESP_SPI_WORK_QUEUE esp_spi_work [esp32_hosted_ng]
[36946.388562] Workqueue: ESP_SPI_WORK_QUEUE esp_spi_work [esp32_hosted_ng]
[40357.734367] Workqueue: ESP_SPI_WORK_QUEUE esp_spi_work [esp32_hosted_ng]
[40357.923118] Workqueue: ESP_SPI_WORK_QUEUE esp_spi_work [esp32_hosted_ng]
root@dreamie ~ # dmesg | grep ESP_SPI_WORK_QUEUE | wc -l
7
root@d ~ # uptime
 10:50:20 up 11:23,  1 users,  load average: 0.66, 0.82, 0.88
kapilkedawat commented 6 months ago

Hi @stwirth, just for a hunch, could you please try following patch. I don't see any concurrency issue in the task but this will serialize the execution.


diff --git a/esp_hosted_ng/host/spi/esp_spi.c b/esp_hosted_ng/host/spi/esp_spi.c
index d0bd513b..28be30de 100644
--- a/esp_hosted_ng/host/spi/esp_spi.c
+++ b/esp_hosted_ng/host/spi/esp_spi.c
@@ -510,7 +510,7 @@ static int spi_init(void)
        uint8_t prio_q_idx = 0;
        struct esp_adapter *adapter;

-       spi_context.spi_workqueue = create_workqueue("ESP_SPI_WORK_QUEUE");
+       spi_context.spi_workqueue = create_singlethread_workqueue("ESP_SPI_WORK_QUEUE");

        if (!spi_context.spi_workqueue) {
                esp_err("spi workqueue failed to create\n");
stwirth commented 5 months ago

Hi @kapilkedawat! I tested your patch today and it did not help. I still get the stack trace from time to time.

stwirth commented 4 months ago

@kapilkedawat is there anything I can do to help debugging this? It seems that this crash is making the kernel choke a bit which causes audio playback issues (audible gaps) on our system.

kapilkedawat commented 4 months ago

Hi @stwirth, is audio running on bt or WiFi? Basically are you using both Bluetooth and WiFi?

stwirth commented 4 months ago

@kapilkedawat the audio playback is straight from disk. There is some parallel WiFi traffic though but at a low bandwidth. I get buffer underruns in the audio pipeline when the kernel module crash happens.

kapilkedawat commented 4 months ago

Hi @stwirth, I went through the backtrace again, backtrace is showing ingenic_spi_setupxfer which isn't available in upstream v5.10 kernel. From the code available over github, it looks like it is trying to set clock rate for every transfer and somehow fails to obtain the lock for it.

From the wifi driver point of view, we are calling spi_sync_transfer() API after correctly setup the SPI controller so the call shouldn't fail(https://elixir.bootlin.com/linux/v5.10.21/source/include/linux/spi/spi.h#L1241).

Could you please check with the platform vendor once if they encountered similar issue or updated their SPI driver recently?

stwirth commented 3 months ago

Thanks @kapilkedawat! I received an updated SPI driver from ingenic and with that the crash does not occur anymore. Thanks for your support!