espressif / esp-hosted

Hosted Solution (Linux/MCU) with ESP32 (Wi-Fi + BT + BLE)
Other
624 stars 144 forks source link

ESP-Hosted-NG: ESP reboots during large WiFi data transfers #364

Open PokersKun opened 2 months ago

PokersKun commented 2 months ago

Hello!

I'm experiencing ESP reboot issue while using NG's WiFi feature, the phenomenon is that when uploading a file to Host via SCP, ESP shows an error message and reboots, resulting in the file not being uploaded successfully.

For more information on what I'm currently using, please refer to: #357

Below is the reproduction method:

  1. Host loads the driver and connects to the AP to get the IP address.

  2. Upload files to Host via other PC using SCP via that IP address.

  3. An error occurs:

assert failed: queue_next_transaction spi_slave_api.c:407 (spi_trans->rx_buffer)
Core  0 register dump:
MEPC    : 0x40380608  RA      : 0x40389420  SP      : 0x3fcce4f0  GP      : 0x3fcb0fb0
TP      : 0x3fcafa88  T0      : 0x37363534  T1      : 0x7271706f  T2      : 0x33323130
S0/FP   : 0x00000077  S1      : 0x00000001  A0      : 0x3fcce52c  A1      : 0x3fcb281d
A2      : 0x00000001  A3      : 0x00000029  A4      : 0x00000001  A5      : 0x3fcb7000
A6      : 0x0000000c  A7      : 0x76757473  S2      : 0x00000009  S3      : 0x3fcce650
S4      : 0x3fcb281c  S5      : 0x00000000  S6      : 0x00000000  S7      : 0x00000000
S8      : 0x00000000  S9      : 0x00000000  S10     : 0x00000000  S11     : 0x00000000
T3      : 0x6e6d6c6b  T4      : 0x6a696867  T5      : 0x66656463  T6      : 0x62613938
MSTATUS : 0x00001881  MTVEC   : 0x40380001  MCAUSE  : 0x00000007  MTVAL   : 0x00000000
MHARTID : 0x00000000

Stack memory:
3fcce4f0: 0xa5a5a5a5 0xa5a5a5a5 0x3c075408 0x4038e5d6 0x3fcb2b1c 0x3c075408 0x3fcb2c80 0x3c075170
3fcce510: 0x3fcb2b2c 0x3fcce524 0x3fcb2b30 0x3c0751f4 0x3fcb281c 0x00373034 0x00000001 0x65737361
3fcce530: 0x66207472 0x656c6961 0x71203a64 0x65756575 0x78656e5f 0x72745f74 0x61736e61 0x6f697463
3fcce550: 0x7073206e 0x6c735f69 0x5f657661 0x2e697061 0x30343a63 0x73282037 0x745f6970 0x736e6172
3fcce570: 0x78723e2d 0x6675625f 0x29726566 0x00000000 0x3fccd094 0x00000000 0x00000001 0x4038dda4
3fcce590: 0x00000000 0x00000000 0x00000000 0x400586f4 0x3fce0000 0x3fcb95a4 0x3fcb7000 0x4038dda4
3fcce5b0: 0x00000000 0x00000000 0x00000000 0x400586f4 0x3fce0000 0x00000000 0x3fcdcb70 0x400587c2
3fcce5d0: 0x00000000 0x00000000 0x00000000 0x00000640 0x00000001 0x00000008 0x3fcb7680 0x40380a22
3fcce5f0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000008 0x00000640 0x00000000 0x00000000
3fcce610: 0x00000000 0x3fcd6e18 0x3fcb95a4 0x4200af16 0x00000000 0x00000000 0x00000000 0x000005f8
3fcce630: 0x00000000 0x00000000 0x00000000 0x4200afca 0x00000000 0x00000000 0x00000000 0x00000000
3fcce650: 0x00000000 0x00000000 0x00000000 0x3fcb958c 0x00000000 0x00000000 0x00000000 0x4038bbf6
3fcce670: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcce690: 0x00000000 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0x00000154 0x3fcce560
3fcce6b0: 0x3fcb7014 0x3fcb3708 0x3fcd1010 0x3fcce6ac 0x3fcb3700 0x00000003 0x3fccd0c0 0x3fccd0c0
3fcce6d0: 0x3fcce6ac 0x00000000 0x00000016 0x3fccd6a8 0x5f697073 0x74736f70 0x6f72705f 0x00736563
3fcce6f0: 0x00000000 0x3fcce6a0 0x00000016 0x00000000 0x00000000 0x00000000 0x00000000 0x3fcb7884
3fcce710: 0x3fcb78ec 0x3fcb7954 0x00000000 0x00000000 0x00000001 0x00000000 0x00000000 0x00000000
3fcce730: 0x4206aff8 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcce750: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcce770: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcce790: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcce7b0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcce7d0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcce7f0: 0x00000000 0x00000000 0x00000000 0x3f000000 0x00000234 0x3fcce858 0x3fcce9d8 0x3fccea38
3fcce810: 0x3fcce9c0 0x00000000 0x3fcce81c 0xffffffff 0x3fcce81c 0x3fcce81c 0x00000000 0x3fcce830
3fcce830: 0xffffffff 0x3fcce830 0x3fcce830 0x00000000 0x00000014 0x00000018 0x6d00ffff 0x00000000
3fcce850: 0xb33fffff 0x00000000 0x3fcb9e00 0x00000000 0x3fcb9e00 0x00040000 0x00030000 0x4038e5f4
3fcce870: 0x3fcb9380 0x00000000 0x3fcb9380 0x01180000 0x00030000 0x4038e5f4 0x3fcb9a58 0x00000000
3fcce890: 0x3fcb9a58 0x00040000 0x00030000 0x4038e5f4 0x3fcb99fc 0x00000000 0x3fcb99fc 0x00080000
3fcce8b0: 0x00020000 0x4038e5f4 0x3fcb99fc 0x00000000 0x3fcb99fc 0x006d0000 0x00030000 0x4038e5f4
3fcce8d0: 0x3fcb9e00 0x00000000 0x3fcb9e00 0x00080000 0x00020000 0x4038e5f4 0x3fcb93f8 0x00000000

ELF file SHA256: 9abaaf89bc3c89b0

Rebooting...
[  246.007232] esp32_spi: process_esp_bootup_event: Received ESP bootup event
[  246.037410] esp32_spi: esp_reg_notifier: cfg80211 regulatory domain callback for 00, current=
[  246.105686] esp32_spi: prepare_command_request: command queue init is not done yet
[  246.113302] esp32_spi: cmd_set_reg_domain: Failed to get command node
[  246.187324] esp32_spi: esp_reg_notifier: cfg80211 regulatory domain callback for CN, current=00
[  246.242165] esp32_spi: prepare_command_request: command queue init is not done yet
[  246.297859] esp32_spi: cmd_set_reg_domain: Failed to get command node
[  246.425783] esp32_spi: process_event_esp_bootup: Bootup Event tag: 3
[  246.432179] esp32_spi: esp_validate_chipset: Chipset=ESP32-C2 ID=0c detected over SPI
[  246.442009] esp32_spi: process_event_esp_bootup: Bootup Event tag: 2
[  246.448517] esp32_spi: process_event_esp_bootup: Bootup Event tag: 0
[  246.454889] esp32_spi: process_event_esp_bootup: Bootup Event tag: 1
[  246.461354] esp32_spi: process_fw_data: ESP chipset's last reset cause:
[  246.468023] esp32_spi: print_reset_reason: SW_CPU_RESET
[  246.473257] esp32_spi: check_esp_version: ESP Firmware version: 1.0.3
[  246.488277] esp32_spi: esp_reg_notifier: 663 esp_wifi_device not initialized yet
[  246.815513] esp32_spi: init_bt: ESP Bluetooth init
[  246.821023] esp32_spi: print_capabilities: Capabilities: 0xe8. Features supported are:
[  246.845816] esp32_spi: print_capabilities:    * WLAN on SPI
[  246.851437] esp32_spi: print_capabilities:    * BT/BLE
[  246.856595] esp32_spi: print_capabilities:      - HCI over SPI
[  246.862356] esp32_spi: print_capabilities:      - BLE only
[  247.226534] Bluetooth: MGMT ver 1.22
[  248.621066] esp32_spi: cmd_auth_request: Authentication request: b0:39:56:cf:9b:4d 6 0 0 0
[  248.886625] esp32_spi: cmd_assoc_request: Association request: b0:39:56:cf:9b:4d 6 39
[  248.959011] esp32_spi: process_assoc_event: Connection status: 0
[  248.989583] esp32_spi: process_rx_packet: Rx PACKET_TYPE_EAPOL!!!!
[  249.034942] esp32_spi: process_rx_packet: Rx PACKET_TYPE_EAPOL!!!!
PS D:\storage> scp .\flash.bin root@192.168.1.94:/home/root
The authenticity of host '192.168.1.94 (192.168.1.94)' can't be established.
ED25519 key fingerprint is SHA256:wEpBLuxJHxPrEIv2EO3B+QEegjNcxESiZXf2aquT5dM.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])?
Warning: Permanently added '192.168.1.94' (ED25519) to the list of known hosts.
flash.bin                                                                               0%    0     0.0KB/s   --:-- ETAConnection reset by 192.168.1.94 port 22
./flash.bin: Broken pipe
mantriyogesh commented 2 months ago

Can you please check if your current sdkconfig has these queues changed:

https://github.com/espressif/esp-hosted/blob/3c3fc1029b00ac46d98ab98495b7a817ddb5bc2d/esp_hosted_ng/esp/esp_driver/network_adapter/sdkconfig.defaults.esp32c2#L30-L31

mantriyogesh commented 2 months ago

Also, do you use bluetooth as such? C2 is restrictive ram. If you do not use bluetooth, it is easier to disable. If you need bluetooth then need to restrict max mem usage like above

PokersKun commented 2 months ago

Can you please check if your current sdkconfig has these queues changed:

https://github.com/espressif/esp-hosted/blob/3c3fc1029b00ac46d98ab98495b7a817ddb5bc2d/esp_hosted_ng/esp/esp_driver/network_adapter/sdkconfig.defaults.esp32c2#L30-L31

Yes, I know this file changed them after 589ef50, which solved my bluetooth panic.

And, my C2 is built based on the latest current master branch.

Also, do you use bluetooth as such? C2 is restrictive ram. If you do not use bluetooth, it is easier to disable. If you need bluetooth then need to restrict max mem usage like above

Unfortunately, our scenario requires both WiFi and Bluetooth to work at the same time, but with that issue I can be sure that Bluetooth is simply initialized but not really working.

Here's a detailed panic log via idf.py monitor:

assert failed: queue_next_transaction spi_slave_api.c:407 (spi_trans->rx_buffer)
Core  0 register dump:
MEPC    : 0x40380608  RA      : 0x40389420  SP      : 0x3fcce510  GP      : 0x3fcb0fb0  
Stack dump detected
0x40380608: panic_abort at /opt/work/esp-hosted/esp_hosted_ng/esp/esp_driver/esp-idf/components/esp_system/panic.c:452
0x40389420: __ubsan_include at /opt/work/esp-hosted/esp_hosted_ng/esp/esp_driver/esp-idf/components/esp_system/ubsan.c:313

TP      : 0x3fcafa28  T0      : 0x37363534  T1      : 0x7271706f  T2      : 0x33323130  
S0/FP   : 0x00000077  S1      : 0x00000001  A0      : 0x3fcce54c  A1      : 0x3fcb281d  
A2      : 0x00000001  A3      : 0x00000029  A4      : 0x00000001  A5      : 0x3fcb7000  
A6      : 0x0000000c  A7      : 0x76757473  S2      : 0x00000009  S3      : 0x3fcce670  
S4      : 0x3fcb281c  S5      : 0x00000000  S6      : 0x00000000  S7      : 0x00000000  
S8      : 0x00000000  S9      : 0x00000000  S10     : 0x00000000  S11     : 0x00000000  
T3      : 0x6e6d6c6b  T4      : 0x6a696867  T5      : 0x66656463  T6      : 0x62613938  
MSTATUS : 0x00001881  MTVEC   : 0x40380001  MCAUSE  : 0x00000007  MTVAL   : 0x00000000  
0x40380001: _vector_table at ??:?

MHARTID : 0x00000000  

Failed to run gdb_panic_server.py script: Command '['riscv32-esp-elf-gdb', '--batch', '-n', '/opt/work/esp-hosted/esp_hosted_ng/esp/esp_driver/network_adapter/build/network_adapter.elf', '-ex', 'target remote | "/home/aduro/.espressif/python_env/idf5.1_py3.10_env/bin/python" -m esp_idf_panic_decoder --target esp32c2 "/tmp/tmpbm_a_kss"', '-ex', 'bt']' returned non-zero exit status 1.
b'Traceback (most recent call last):\n  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main\n    return _run_code(code, main_globals, None,\n  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code\n    exec(code, run_globals)\n  File "/home/aduro/.espressif/python_env/idf5.1_py3.10_env/lib/python3.10/site-packages/esp_idf_panic_decoder/__main__.py", line 4, in <module>\n    main()\n  File "/home/aduro/.espressif/python_env/idf5.1_py3.10_env/lib/python3.10/site-packages/esp_idf_panic_decoder/gdb_panic_server.py", line 281, in main\n    panic_info = PANIC_OUTPUT_PARSERS[args.target](args.input_file.read())\n  File "/home/aduro/.espressif/python_env/idf5.1_py3.10_env/lib/python3.10/site-packages/esp_idf_panic_decoder/gdb_panic_server.py", line 134, in parse_idf_riscv_panic_output\n    raise ValueError("Couldn\'t parse panic handler output")\nValueError: Couldn\'t parse panic handler output\nRemote communication error.  Target disconnected.: Connection reset by peer.\nNo stack.\n'

Stack memory:
0x4038e5d6 0x3fcb2b1c 0x3c075488 0x3fcb2c80 0x3c0751f0
3fcce530: 0x3fcb2b2c 0x3fcce544 0x3fcb2b30 0x3c075274 0x3fcb281c 0x00373034 0x3fcce5c0 0x65737361
3fcce550: 0x66207472 0x656c6961 0x71203a64 0x65756575 0x78656e5f 0x72745f74 0x61736e61 0x6f697463
3fcce570: 0x7073206e 0x6c735f69 0x5f657661 0x2e697061 0x30343a63 0x73282037 0x745f6970 0x736e6172
3fcce590: 0x78723e2d 0x6675625f 0x29726566 0x00000000 0x3fccd0b4 0x00000000 0x00000001 0x4038dda4
3fcce5b0: 0x00000000 0x00000000 0x00000000 0x400586f4 0x3fce0000 0x3fcb9b34 0x3fcb7020 0x4038dda4
3fcce5d0: 0x00000000 0x00000000 0x00000000 0x400586f4 0x3fce0000 0x00000000 0x3fcdcb70 0x400587c2
3fcce5f0: 0x00000000 0x00000000 0x00000000 0x00000640 0x00000001 0x00000008 0x3fcb76a0 0x40380a22
3fcce610: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000008 0x00000640 0x00000000 0x00000000
3fcce630: 0x00000000 0x3fcd36cc 0x3fcb9b34 0x4200b196 0x00000000 0x00000000 0x00000000 0x000005f8
3fcce650: 0x00000000 0x00000000 0x00000000 0x4200b24a 0x00000000 0x00000000 0x00000000 0x00000000
3fcce670: 0x00000000 0x00000000 0x00000000 0x3fcd182c 0x00000000 0x00000000 0x00000000 0x4038bbf6
3fcce690: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcce6b0: 0x00000000 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0x00000154 0x3fcce580
3fcce6d0: 0x3fcb7034 0x3fcb3708 0x3fcd1030 0x3fcce6cc 0x3fcb3700 0x00000003 0x3fccd0e0 0x3fccd0e0
3fcce6f0: 0x3fcce6cc 0x00000000 0x00000016 0x3fccd6c8 0x5f697073 0x74736f70 0x6f72705f 0x00736563
3fcce710: 0x00000000 0x3fcce6c0 0x00000016 0x00000000 0x00000000 0x00000000 0x00000000 0x3fcb78a4
3fcce730: 0x3fcb790c 0x3fcb7974 0x00000000 0x00000000 0x00000001 0x00000000 0x00000000 0x00000000
3fcce750: 0x4206b278 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcce770: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcce790: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcce7b0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcce7d0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcce7f0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
3fcce810: 0x00000000 0x00000000 0x00000000 0x3f000000 0x00000234 0x3fcce878 0x3fcce9b0 0x3fccea58
3fcce830: 0x3fcce998 0x00000000 0x3fcce83c 0xffffffff 0x3fcce83c 0x3fcce83c 0x00000000 0x3fcce850
3fcce850: 0xffffffff 0x3fcce850 0x3fcce850 0x00000000 0x00000014 0x00000018 0x7100ffff 0x00000000
3fcce870: 0xb33fffff 0x00000000 0x3fcd1df8 0x00000000 0x3fcd1df8 0x00040000 0x00030000 0x4038e5f4
3fcce890: 0x3fcb9e20 0x00000000 0x3fcb9e20 0x00080000 0x00020000 0x4038e5f4 0x3fcb99e4 0x00000000
3fcce8b0: 0x3fcb99e4 0x006d0000 0x00030000 0x4038e5f4 0x3fcd17e0 0x00000000 0x3fcd17e0 0x00080000
3fcce8d0: 0x00020000 0x4038e5f4 0x3fcd1904 0x00000000 0x3fcd1904 0x00fc0000 0x00030000 0x4038e5f4
3fcce8f0: 0x3fcd18ec 0x00000000 0x3fcd18ec 0x00790000 0x00040000 0x4038e5f4 0x3fcd1868 0x00000000

ELF file SHA256: 2ad21db4147e5b71

Rebooting...
mantriyogesh commented 2 months ago

@Shreyas0-7 Can you please check if you face crash like this when Wi-Fi and Bluetooth both used?

@PokersKun I hope the there is no change in the devices you had used as part of #357 . Can you please share your current sdkconfig?

PokersKun commented 2 months ago

@PokersKun I hope the there is no change in the devices you had used as part of #357 . Can you please share your current sdkconfig?

Yes no change, and I can encounter this problem on both imx6ull and Raspberry Pi 4B.

This is the sdkconfig.txt currently in use (XTAL frequency only modified via idf.py menuconfig)

mantriyogesh commented 2 months ago

Also, @PokersKun Can you please state the procedure you follow to reproduce?

I mean apart from SCP of the file, do you also run Bluetooth commands or keep bluetoothctl in some state as such?

PokersKun commented 2 months ago

Also, @PokersKun Can you please state the procedure you follow to reproduce?

I mean apart from SCP of the file, do you also run Bluetooth commands or keep bluetoothctl in some state as such?

Here are the steps I followed:

  1. Power up and start Host.
  2. Load the driver via insmod and execute ifconfig mlan0 up and hciconfig hci0 up.
  3. Connect to the AP via wpa_supplicant and get the IP address via udhcpc.
  4. Transfer files via SCP, ESP gets panicked.

The command hciconfig hci0 up was executed only for bluetooth, and I got the following result from bluetoolthctl show:

Controller 08:3A:8D:40:F9:B2 (public)
        Name: eria
        Alias: eria
        Class: 0x00000000
        Powered: yes
        Discoverable: no
        DiscoverableTimeout: 0x000000b4
        Pairable: no
        UUID: Handsfree                 (0000111e-0000-1000-8000-00805f9b34fb)
        UUID: Generic Attribute Profile (00001801-0000-1000-8000-00805f9b34fb)
        UUID: Generic Access Profile    (00001800-0000-1000-8000-00805f9b34fb)
        UUID: PnP Information           (00001200-0000-1000-8000-00805f9b34fb)
        UUID: A/V Remote Control Target (0000110c-0000-1000-8000-00805f9b34fb)
        UUID: A/V Remote Control        (0000110e-0000-1000-8000-00805f9b34fb)
        UUID: Device Information        (0000180a-0000-1000-8000-00805f9b34fb)
        Modalias: usb:v1D6Bp0246d0542
        Discovering: no
        Roles: central
        Roles: peripheral
Advertising Features:
        ActiveInstances: 0x00 (0)
        SupportedInstances: 0x02 (2)
        SupportedIncludes: tx-power
        SupportedIncludes: appearance
        SupportedIncludes: local-name
        SupportedSecondaryChannels: 1M
        SupportedSecondaryChannels: 2M
        SupportedSecondaryChannels: Coded
mantriyogesh commented 2 months ago

Hello @PokersKun ,

Apologies for the time it is took. We have reproduced the crash you face.

Please allow us some time to provide fix.

mantriyogesh commented 2 months ago

@PokersKun ,

We are still working on this issue, it is getting little late because of some other priority work in parallel. We will keep you posted.

mantriyogesh commented 2 months ago

Hello @PokersKun ,

We are yet taking time to resolve this. We partly resolved but still have some issues in regression testing.

Furthermore, we will keep you posted once we have a correct fix.

PokersKun commented 2 months ago

Hello @mantriyogesh ,

Thanks for still working hard on this.

We recently learned that NG's OTA has not been developed yet (via SPI), due to project needs I had to switch to ESP-Hosted-FG to try it out, but still encountered the problem described in this issue, could you please check if it's the same cause?

PS: We did not make any changes to our hardware environment, just replaced the Host driver and ESP software to the FG version, and the SPI clock frequency is still 4MHz.

mantriyogesh commented 2 months ago

Sure, We will test at our side for FG as well, and get you the fix.

mantriyogesh commented 2 months ago

We have NG interim patch, if you could verify at your end: 433.patch

We are still evaluating this patch for possible throughput reduction, but it should solve the crash. Credits @Shreyas0-7.

PokersKun commented 2 months ago

We have NG interim patch, if you could verify at your end: 433.patch

Thank you very much!

I've made changes on FG in reference to that patch and it seems to have solved my problem (haven't tested the throughput yet, but it works fine in my scenario).

Here is the patch file.