espressif / esp-rainmaker

ESP RainMaker Agent for firmware development
Apache License 2.0
433 stars 146 forks source link

Using NimBle in app makes crash after provisioning (MEGH-3669) #138

Closed jacek12345 closed 1 year ago

jacek12345 commented 1 year ago

I need constantly using BLE in rainmaker. Communicate with BLE sensor (using NimBle, as RainMaker while BLE provisioning). It works since provisioning is already done. When I need to make provisioning, app crashes on my NimBle initialisation. I placed NimBle initialisation after app_wifi_start() function:

` /* Start the Wi-Fi.

`

There is app crash after provisioning and connecting wifi:

` I (1185) app_wifi: Provisioning Started. Name : PROV_89a126, POP : 0aea9731 I (1195) app_wifi: Provisioning will auto stop after 30 minute(s). I (34205) protocomm_nimble: mtu update event; conn_handle=1 cid=4 mtu=256

I (47645) esp_rmaker_user_mapping: Received request for node details I (47645) esp_rmaker_user_mapping: Got user_id = SignInWithApple_t11111111111111111111, secret_key = 11111111-1111-1111-1111-111111111111 I (47655) esp_rmaker_user_mapping: Sending status SUCCESS I (47925) app_wifi: Received Wi-Fi credentials SSID : xxxxxxx Password : xxxxxxxxxxxxx I (48935) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1 I (48935) wifi:state: init -> auth (b0) I (49635) wifi:state: auth -> assoc (0) I (49645) wifi:state: assoc -> run (10) I (49665) wifi:connected with xxxxx, aid = 36, channel 1, BW20, bssid = xx:xx:xx:xx:xx:xx I (49665) wifi:security: WPA3-SAE, phy: bgn, rssi: -60 I (49735) wifi:pm start, type: 1

I (49735) wifi:set rx beacon pti, rx_bcn_pti: 14, bcn_timeout: 14, mt_pti: 25000, mt_time: 10000 I (49745) wifi:BcnInt:102400, DTIM:3 W (49855) wifi:idx:0 (ifx:0, xx:xx:xx:xx:xx:xx), tid:0, ssn:1, winSize:64 I (50625) app_wifi: Connected with IP Address:192.168.1.63 I (50625) esp_netif_handlers: sta ip: 192.168.1.63, mask: 255.255.255.0, gw: 192.168.1.1 I (50625) wifi_prov_mgr: STA Got IP I (50625) app_wifi: Provisioning successful I (50635) esp_rmaker_local: Waiting for Wi-Fi provisioning to finish. I (50645) esp_mqtt_glue: Connecting to a1p72mufdu6064-ats.iot.us-east-1.amazonaws.com I (50645) esp_mqtt_glue: Waiting for MQTT connection. This may take time. W (50655) BTDM_INIT: esp_bt_controller_mem_release not implemented, return OK ESP_ERROR_CHECK failed: esp_err_t 0x103 (ESP_ERR_INVALID_STATE) at 0x40389863 0x40389863: _esp_error_check_failed at F:/GitHub/esp-idf/components/esp_system/esp_err.c:42

file: "../main/app_main.c" line 318 func: app_main expression: esp_nimble_hci_and_controller_init()

abort() was called at PC 0x40389867 on core 0 0x40389867: _esp_error_check_failed at F:/GitHub/esp-idf/components/esp_system/esp_err.c:43

Stack dump detected Core 0 register dump: MEPC : 0x4038085a RA : 0x40389872 SP : 0x3fc9fb00 GP : 0x3fc92800 0x4038085a: panic_abort at F:/GitHub/esp-idf/components/esp_system/panic.c:402

0x40389872: __ubsan_include at F:/GitHub/esp-idf/components/esp_system/ubsan.c:294

TP : 0x3fc63538 T0 : 0x37363534 T1 : 0x7271706f T2 : 0x33323130 S0/FP : 0x00000004 S1 : 0x3fc9fb64 A0 : 0x3fc9fb2c A1 : 0x3fc9fb62 A2 : 0x00000000 A3 : 0x3fc9fb59 A4 : 0x00000001 A5 : 0x3fc9c000 A6 : 0x7a797877 A7 : 0x76757473 S2 : 0x3fcb1594 S3 : 0x00000000 S4 : 0x00000000 S5 : 0x00000000 S6 : 0x00000000 S7 : 0x00000000 S8 : 0x00000000 S9 : 0x00000000 S10 : 0x00000000 S11 : 0x00000000 T3 : 0x6e6d6c6b T4 : 0x6a696867 T5 : 0x66656463 T6 : 0x62613938 MSTATUS : 0x00001881 MTVEC : 0x40380001 MCAUSE : 0x00000007 MTVAL : 0x00000000 0x40380001: _vector_table at ??:?

MHARTID : 0x00000000

Failed to run gdb_panic_server.py script: Command '['riscv32-esp-elf-gdb', '--batch', '-n', 'f:\github\esp-rainmaker\examples\j_switch\build\switch.elf', '-ex', 'target remote | "C:\Users\jacek\.espressif\python_env\idf4.4_py3.7_env\Scripts\python.exe" "F:\GitHub\esp-idf\tools\idf_monitor_base\..\gdb_panic_server.py" --target esp32c3 "C:\Users\jacek\AppData\Local\Temp\tmpxs2khoaw"', '-ex', 'bt']' returned non-zero exit status 1. b'C:\Users\jacek\.espressif\tools\riscv32-esp-elf\esp-2021r2-patch3-8.4.0\riscv32-esp-elf\bin\riscv32-esp-elf-gdb.exe: warning: Couldn\'t determine a path for the index cache directory.\r\nerror starting child process \'"C:\Users\jacek\.espressif\python_env\idf4.4_py3.7_env\Scripts\python.exe" "F:\GitHub\esp-idf\tools\idf_monitor_base\..\gdb_panic_server.py" --target esp32c3 "C:\Users\jacek\AppData\Local\Temp\tmpxs2khoaw"\': CreateProcess: No such file or directory\r\nNo stack.\r\n'

Core 0 register dump: MEPC : 0x4038085a RA : 0x40389872 SP : 0x3fc9fb00 GP : 0x3fc92800 TP : 0x3fc63538 T0 : 0x37363534 T1 : 0x7271706f T2 : 0x33323130 S0/FP : 0x00000004 S1 : 0x3fc9fb64 A0 : 0x3fc9fb2c A1 : 0x3fc9fb62 A2 : 0x00000000 A3 : 0x3fc9fb59 A4 : 0x00000001 A5 : 0x3fc9c000 A6 : 0x7a797877 A7 : 0x76757473 S2 : 0x3fcb1594 S3 : 0x00000000 S4 : 0x00000000 S5 : 0x00000000 S6 : 0x00000000 S7 : 0x00000000 S8 : 0x00000000 S9 : 0x00000000 S10 : 0x00000000 S11 : 0x00000000 T3 : 0x6e6d6c6b T4 : 0x6a696867 T5 : 0x66656463 T6 : 0x62613938 MSTATUS : 0x00001881 MTVEC : 0x40380001 MCAUSE : 0x00000007 MTVAL : 0x00000000 MHARTID : 0x00000000

Stack memory: 3fc9fb00: 0x3c105440 0x40389866 0x3fc9fb60 0x4039020e 0x420210e8 0x42020fe4 0x3fc9c000 0x3fc94b50 3fc9fb20: 0x3fc9fb64 0x3fc94b6c 0x3fc9fb60 0x726f6261 0x20292874 0x20736177 0x6c6c6163 0x61206465 3fc9fb40: 0x43502074 0x34783020 0x39383330 0x20373638 0x63206e6f 0x2065726f 0x00000030 0x00000000 3fc9fb60: 0x01000030 0x38333034 0x37363839 0x00000000 0x3fcb1594 0x3fcb1640 0x3fc9c000 0x4038986a 3fc9fb80: 0x3fcb1594 0x3fcb1640 0x3fc9c000 0x42008680 0x00000000 0x00000000 0x3fc9fe00 0x00000000 3fc9fba0: 0x00000000 0x3c10c195 0x00000000 0x01060c00 0x00000000 0x00000000 0x00000000 0x420ff13c 3fc9fbc0: 0x00000000 0x00000000 0x00000000 0x4038d22a 0x00000000 0x00000000 0x00000000 0x00000000 3fc9fbe0: 0x00000000 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0x27146414 0x01010107 0x00000001 0x000000ff 3fc9fc00: 0x7f7fffff 0x00000000 0x00000000 0x3fe00000 0x00000000 0x3ff00000 0x00000000 0x40000000 3fc9fc20: 0xffc00000 0x41dfffff 0x94a03595 0x3fdfffff 0x35afe535 0x3fe00000 0x94a03595 0x3fcfffff 3fc9fc40: 0x00000000 0x3fc00000 0x00000000 0x40300000 0x00000000 0x3ff80000 0x636f4361 0x3fd287a7 3fc9fc60: 0x8b60c8b3 0x3fc68a28 0x509f79fb 0x3fd34413 0x00000000 0x40240000 0x00000000 0x401c0000 3fc9fc80: 0x00000000 0x40140000 0x00000000 0x43500000 0x00000000 0x7ff80000 0x7fc00000 0x00000000 3fc9fca0: 0xa5a5a5a5 0x00000154 0x3fc9fa10 0x0000000f 0x3fc95e4c 0x3fc95e4c 0x3fc9fca8 0x3fc95e44 3fc9fcc0: 0x00000018 0x3fca638c 0x3fca638c 0x3fc9fca8 0x00000000 0x00000001 0x3fc9eca4 0x6e69616d 3fc9fce0: 0x00000000 0x00000000 0x0000ffff 0x00000000 0x3fc9fca0 0x00000001 0x00000000 0x3fca740c 3fc9fd00: 0x42000260 0x00000000 0x3fc9d44c 0x3fc9d4b4 0x3fc9d51c 0x00000000 0x00000000 0x00000001 3fc9fd20: 0x00000000 0x00000000 0x00000000 0x420e94c6 0x00000000 0x00000000 0x00000000 0x00000000 3fc9fd40: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 3fc9fd60: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 3fc9fd80: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 3fc9fda0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 3fc9fdc0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 3fc9fde0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x3f000000 0x00000154 3fc9fe00: 0x3fca03e0 0x3fc9cab4 0x3fc95e38 0x3fc95e38 0x3fc9fe00 0x3fc95e30 0x00000019 0x8a97cab0 3fc9fe20: 0xc84f7d44 0x3fc9fe00 0x00000000 0x00000000 0x3fc9ff58 0x454c4449 0xd7d23800 0xca21ac39 3fc9fe40: 0x00d7aaf6 0x00000000 0x3fca0550 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 3fc9fe60: 0x3fc9d44c 0x3fc9d4b4 0x3fc9d51c 0x00000000 0x00000000 0x00000001 0x00000000 0x00000000 3fc9fe80: 0x00000000 0x420e94c6 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 3fc9fea0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0000 3fc9fee0: 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000

ELF file SHA256: b19322dab83446a1 ELF file SHA256: b19322dab83446a1

Rebooting... ESP-ROM:esp32c3-api1-20210207 Build:Feb 7 2021 rst:0x3 (RTC_SW_SYS_RST),boot:0xc (SPI_FAST_FLASH_BOOT) Saved PC:0x40380654 0x40380654: esp_restart_noos_dig at F:/GitHub/esp-idf/components/esp_system/esp_system.c:44 (discriminator 1)

SPIWP:0xee mode:DIO, clock div:1 load:0x3fcd6100,len:0x16bc load:0x403ce000,len:0x930 load:0x403d0000,len:0x2d28 entry 0x403ce000 `

I have configured CONFIG_APP_WIFI_PROV_TRANSPORT_BLE in menuconfig

Is it seems that BLE is not deinitialised after provisioning? And therefor it crashes while my NimBle initialisation? What should I do if i want to use NimBle in app all the time (besides provisioning time).

IshaESP commented 1 year ago

Hi @jacek12345 , Can you please try out this patch? BT_configurable.zip

Flags can be changed from menuconfig. For example, if BT is to be kept on after provisioning, follow these steps:

  1. idf.py menuconfig
  2. Go to Component config → Wi-Fi Provisioning Manager → Enable WIFI_PROV_KEEP_BT_ON_AFTER_PROV

Similarly, if device needs to keep connected, disable WIFI_PROV_DISCONNECT_AFTER_PROV flag (same path as above).

Thanks, Isha

jacek12345 commented 1 year ago

Thank You for patch. I have problem with installing it:

F:\GitHub\esp-idf>git apply BT_configurable.patch error: patch failed: components/wifi_provisioning/Kconfig:39 error: components/wifi_provisioning/Kconfig: patch does not apply

F:\GitHub\esp-idf>

IshaESP commented 1 year ago

Hi, Patch is getting applied at my end. Can you try making the changes manually for that file if possible.

Thanks, Isha

jacek12345 commented 1 year ago

Sorry, i think i'm doing something wrong because no files are modified at my end. But if i understand from Your patch file analysing, NimBle controller should be deinitialised after provisioning without additional flags from Your patch? It only adds some possibility for not disabling if it is needed? Is it correct or I missunderstood? If that is correct, i don't need Your patch since i want to have NimBle always disabled after provisioning? So it seems that something is not properly deinitialised after provisioning, when my esp_nimble_hci_and_controller_init() initialisation begins?

jacek12345 commented 1 year ago

I changed all files manually. SET in menuconfig: WIFI_PROV_KEEP_BT_ON_AFTER_PROV = y WIFI_PROV_DISCONNECT_AFTER_PROV = n

Also changed from ESP_ERROR_CHECK(esp_nimble_hci_and_controller_init()); to ESP_ERROR_CHECK_WITHOUT_ABORT(esp_nimble_hci_and_controller_init());

BLE works after provisioning, but .... the provisioning hangs on phone app File (3)

From minitor: `I (1205) app_wifi: Provisioning Started. Name : PROV_89a126, POP : 0aea9731 I (1215) app_wifi: Provisioning will auto stop after 30 minute(s). I (12705) protocomm_nimble: mtu update event; conn_handle=1 cid=4 mtu=256

I (23475) esp_rmaker_user_mapping: Received request for node details I (23475) esp_rmaker_user_mapping: Got user_id = SignInWithApple_txxxxxxxxxxxxxxxx, secret_key = xxxxxxxxxx-xxxxxxF-4xx9-9xx0-D65CxxxxF8A6 I (23485) esp_rmaker_user_mapping: Sending status SUCCESS I (23805) app_wifi: Received Wi-Fi credentials SSID : xxxxxx Password : xxxxxxxxxxxx I (24815) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1 I (24815) wifi:state: init -> auth (b0) I (25515) wifi:state: auth -> assoc (0) I (25515) wifi:state: assoc -> run (10) I (25535) wifi:connected with Jacek, aid = 28, channel 1, BW20, bssid = e8:d2:ff:59:9c:a4 I (25535) wifi:security: WPA3-SAE, phy: bgn, rssi: -78 I (25555) wifi:pm start, type: 1

I (25555) wifi:set rx beacon pti, rx_bcn_pti: 14, bcn_timeout: 14, mt_pti: 25000, mt_time: 10000 I (25565) wifi:BcnInt:102400, DTIM:3 W (25605) wifi:idx:0 (ifx:0, e8:d2:ff:59:9c:a4), tid:0, ssn:1, winSize:64 I (26625) app_wifi: Connected with IP Address:192.168.1.63 I (26625) esp_netif_handlers: sta ip: 192.168.1.63, mask: 255.255.255.0, gw: 192.168.1.1 I (26625) wifi_prov_mgr: STA Got IP I (26625) app_wifi: Provisioning successful I (26635) esp_rmaker_local: Waiting for Wi-Fi provisioning to finish. I (26645) esp_mqtt_glue: Connecting to a1p72mufdu6064-ats.iot.us-east-1.amazonaws.com I (26655) esp_mqtt_glue: Waiting for MQTT connection. This may take time. W (26655) BTDM_INIT: esp_bt_controller_mem_release not implemented, return OK ESP_ERROR_CHECK_WITHOUT_ABORT failed: esp_err_t 0x103 (ESP_ERR_INVALID_STATE) at 0x40389863 0x40389863: _esp_error_check_failed_without_abort at F:/GitHub/esp-idf/components/esp_system/esp_err.c:37

file: "../main/app_main.c" line 320 func: app_main expression: esp_nimble_hci_and_controller_init() I (26685) app_main: BLE Host Task Started I (26685) NimBLE: GAP procedure initiated: discovery; I (26695) NimBLE: own_addr_type=2 filter_policy=0 passive=1 limited=0 filter_duplicates=0 I (26705) NimBLE: duration=forever I (26705) NimBLE:

I (27115) esp-x509-crt-bundle: Certificate validated I (28675) esp_mqtt_glue: MQTT Connected I (28675) app_main: MQTT Connected. I (28675) esp_rmaker_node_config: Reporting Node Configuration of length 3444 bytes. I (28865) esp_rmaker_param: Params MQTT Init done. I (28865) esp_rmaker_param: Reporting params (init): {"56789abc":{"Name":"56789abc","sFrm":256,"sBat":255,"sData0":true},"56789abd":{"Name":"56789abd","sFrm":256,"sBat":255,"sData0":true},"56789abe":{"Name":"56789abe","sFrm":256,"sBat":255,"sData0":true},"56789abf":{"Name":"56789abf","sFrm":256,"sBat":255,"sData0":true},"56789ac0":{"Name":"56789ac0","sFrm":256,"sBat":255,"sData0":true},"Odbiornik":{"Name":"Odbiornik","Power":true},"OTA":{"Status":"","Info":"","URL":""},"Time":{"TZ":"Europe/Warsaw","TZ-POSIX":"CET-1CEST,M3.5.0,M10.5.0/3"},"Schedule":{"Schedules":[{"name":"asaass","id":"qe8t","enabled":true,"action":{"56789abc":{"sFrm":256}},"triggers":[{"m":1043,"d":19}]}]},"Scenes":{"Scenes":[{"name":"dddd","id":"6PKO","info":"Ffffff","action":{"Odbiornik":{"Power":true}}}]},"Local Control":{"POP":"80aa63ad","Type":1}} I (28935) esp_rmaker_user_mapping: MQTT Publish: {"node_id":"3qRhgndncRKQcedsRGbeEi","user_id":"SignInWithApple_tWgGWUPgRU7zwB4JVZVo5J","secret_key":"6BB0F5BA-B04F-4AC9-9830-D65C0ED2F8A6"} I (29565) app_main: MQTT Published. Msg id: 286. I (29695) app_main: MQTT Published. Msg id: 51366. I (29705) esp_rmaker_user_mapping: User Node association message published successfully. W (29705) app_main: Unhandled RainMaker Event: 5 I (29715) app_main: MQTT Published. Msg id: 31941. I (33385) esp_rmaker_time: SNTP Synchronised. I (33385) esp_rmaker_time: The current time is: Thu Jul 21 14:51:14 2022 +0200[CEST], DST: Yes. I (57645) NimBLE: GAP procedure initiated: stop advertising.

I (57665) wifi_prov_mgr: Provisioning stopped I (57665) esp_rmaker_local: Event 6 I (57665) esp_rmaker_local: Starting ESP Local control with HTTP Transport and security version: 1 I (57685) esp_https_server: Starting server I (57685) esp_https_server: Server listening on port 8080 I (57685) esp_rmaker_local: esp_local_ctrl service started with name : 3qRhgndncRKQcedsRGbeEi W (57695) app_main: Unhandled RainMaker Event: 6`

jacek12345 commented 1 year ago

The problem is that i call nimble_port_init(); even if it was initialised earlier (while provisioning). I really don't have idea how to initialise NimBle independently from provisioning runs or not.

IshaESP commented 1 year ago

Hi @jacek12345, First of all, my patch was against master branch. I will work on backporting it for release branches.

Basically, I analyzed your changes and I understand the issue now. You are trying to initialize nimble immediately after app_wifi_start() function returns. In normal case, stack deinitialization after provisioning is an asynchronous process. This involves posting message to a thread which is then processed in stack context. For your change, I suggest to add a delay (Example - vTaskDelay(2000);) before attempting to initialize nimble.

This should help resolve your crash issue.

app_wifi_start() vTaskDelay(2000) < nimble initialization>

Thanks, Isha

jacek12345 commented 1 year ago

Thank You for response. You right, it is asynchronous proces. I tried with delay but this is much more time needed than 2s. Is any convenient callback function or parameter that i can check if stack ends deinitjalisation (is deinitialised), so could initialise it immediately if provisioning didn't occured or after deinitialisation if provisioning was done?

jacek12345 commented 1 year ago

Thank You Isha, it looks that vTaskDelay(2000); helps.

But, could You check flow with Your patch with WIFI_PROV_KEEP_BT_ON_AFTER_PROV disabled? I think, that there is something wrong with deinitialisation. First, i tried with: WIFI_PROV_KEEP_BT_ON_AFTER_PROV = y WIFI_PROV_DISCONNECT_AFTER_PROV = y

In monitor, i see that NimBle stack deinitialisation ends with protocomm_nimble: Protocomm layer has already stopped and next, there is another initialisation: NimBLE: GAP procedure initiated: advertise; I (35344) NimBLE: disc_mode=2 I (35344) NimBLE: adv_channel_map=0 own_addr_type=0 adv_filter_policy=0 adv_itvl_min=256 adv_itvl_max=256 I (35354) NimBLE:

I expected, that with WIFI_PROV_KEEP_BT_ON_AFTER_PROV = n it should simply ends with protocomm_nimble: Protocomm layer has already stopped but it doesn't. I (42494) NimBLE: GAP procedure initiated: stop advertising. I (42504) NimBLE: GAP procedure initiated: stop advertising. I (42514) NimBLE: GAP procedure initiated: terminate connection; conn_handle=1 hci_reason=19 E (42554) protocomm_nimble: Error setting advertisement data; rc = 30 W (42574) Timer: Timer not stopped W (42584) Timer: Timer not stopped I (42604) wifi_prov_mgr: Provisioning stopped W (42604) BTDM_INIT: esp_bt_mem_release not implemented, return OK I (42604) wifi_prov_scheme_ble: BTDM memory released I (42604) esp_rmaker_local: Event 6

Is it correct deinitialisation flow? I can initialise NimBle after that, but I'd like to be sure if it is safe.

IshaESP commented 1 year ago

Hi @jacek12345 , When WIFI_PROV_KEEP_BT_ON_AFTER_PROV is disabled ('n'), that is the expected behaviour because we are stopping the bluetooth host stack here. The clean way to do this it to stop advertising, terminate the connection and then stop the stack. Hence all the console prints.

Thanks, Isha

jacek12345 commented 1 year ago

Thank You