espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.6k stars 7.27k forks source link

Idle Watchdog For Access Point After Restart (IDFGH-1630) #3881

Closed gregjesl closed 4 years ago

gregjesl commented 5 years ago

Environment

Problem Description

Our device sometimes transitions from a WiFi station mode to a WiFi access point. In this scenario, the unit restarts using esp_restart() and then access point is started using the following code:

this->log_i("Starting SoftAP");
ESP_ERROR_CHECK(esp_wifi_set_mode(WIFI_MODE_AP) );

// Build the apConfig structure.
wifi_config_t apConfig;
memset(&apConfig, 0, sizeof(apConfig));
memcpy(apConfig.ap.ssid, ssid.data(), ssid.size());
apConfig.ap.ssid_len = ssid.size();
apConfig.ap.channel         = 0;
apConfig.ap.authmode        = WIFI_AUTH_OPEN;
apConfig.ap.ssid_hidden     = (uint8_t) false;
apConfig.ap.max_connection  = 4;
apConfig.ap.beacon_interval = 100;

ESP_ERROR_CHECK(esp_wifi_set_config(WIFI_IF_AP, &apConfig) );

ESP_ERROR_CHECK(esp_wifi_start() );

this->log_i("SoftAP started");

On some devices, the task watchdog is invoked after attempting to start the access point:

I (921) WIFI_MANAGER: Starting SoftAP
E (5926) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (5926) task_wdt:  - IDLE1 (CPU 1)
E (5926) task_wdt: Tasks currently running:
E (5926) task_wdt: CPU 0: IDLE0
E (5926) task_wdt: CPU 1: IDLE1

When this occurs, the access point is not available/visible, meaning the access point was never started.

Restarting the unit using esp_deep_sleep_start() instead of esp_restart() seems to solve the issue. This leads me to believe the issue in with I2C behavior for WiFi.

liuzfesp commented 5 years ago

@gregjesl, could you help to get the task watchdog backtrace vi a following steps:

gregjesl commented 5 years ago

@liuzfesp I performed the steps you recommended here is the result:

E (5926) task_wdt: Task watchdog got triggered. The following tasks did not rese                                                                                                                                                             t the watchdog in time:
E (5926) task_wdt:  - IDLE1 (CPU 1)
E (5926) task_wdt: Tasks currently running:
E (5926) task_wdt: CPU 0: IDLE0
E (5926) task_wdt: CPU 1: IDLE1
E (5926) task_wdt: Aborting.
abort() was called at PC 0x400d250c on core 0

ELF file SHA256: 43ab51787385f13c0fdaeeb8013d852d387b3a07f81544901fac81bd2abdc5e                                                                                                                                                             f

Backtrace: 0x4008c274:0x3ffb0630 0x4008c4b9:0x3ffb0650 0x400d250c:0x3ffb0670 0x4                                                                                                                                                             008210d:0x3ffb0690 0x401ad497:0x3ffbee30 0x400d3663:0x3ffbee50 0x400901cd:0x3ffb                                                                                                                                                             ee70

Rebooting...
ets Jun  8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x12 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:552
load:0x40078000,len:7940
load:0x40080400,len:5632
entry 0x40080674

When the unit restarted, the unit encountered another error that we had seen where ledc_isr_register() hangs. Our workaround for that particular issue was implementing a timeout; when a timeout occurs the unit restarts using esp_deep_sleep_start() as suggested here. After the unit restarted from the deep sleep, the unit was able to start the access point.

liuzfesp commented 5 years ago

HI @gregjesl I don't have the elf file, could you help to get the backtrace with following command:

xtensa-esp32-elf-addr2line -piaf -e *.elf backtrace

The *.elf is the elf file that related to the test bin.

Moreover, could you give the detailed steps to replicate this issue, e.g:

Alvin1Zhang commented 5 years ago

@gregjesl Thanks for reporting the issue, would you please help provide more details as suggested by @liuzfesp, including sdkconfig; can this issue be reproduced with the example code plus minimum change? steps to replicate this issue. Thanks.

Alvin1Zhang commented 5 years ago

@gregjesl Thanks for reporting the issue, would you please help provide more details as suggested by @liuzfesp , including sdkconfig; can this issue be reproduced with the example code plus minimum change? steps to replicate this issue. Thanks.

esp-daiwei commented 5 years ago

Hi @gregjesl I think this issue should have been fixed by commit: fbd38ad1 (v3.3), d9cfb05e (v4.0), 53d57dd7 (master). These fixes already merged into IDF gitlab internally, may have some days delay before it can be synced to github.

For v3.2/v3.1, will merge the fix soon.

Please let me know if you still encounter similar issue with the fixes.

liuzfesp commented 4 years ago

v3.2 - 09e65746 v3.1 - 907878d6

Alvin1Zhang commented 4 years ago

@gregjesl Thanks for reporting, the fix commits have been shared by @liuzfesp . Feel free to reopen if the issue still happens. Thanks.