espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.73k stars 7.3k forks source link

ESP32 reset by RTC watchdog although RTC WDT is disabled in sdkconfig (IDFGH-6533) #8184

Open MikitaMinau opened 2 years ago

MikitaMinau commented 2 years ago

Environment

Problem Description

Sometimes, I see that ESP32 is reset without any reason. No stack trace is observed during the reset. After the reset, esp_reset_reason() returns 7(ESP_RST_WDT) and rtc_get_reset_reason(PRO_CPU_NUM) returns 16(RTCWDT_RTC_RESET):

I (13471) KernelInterface.c: ESP reset reason is 7
I (13471) KernelInterface.c: RTC reset reason is 16

Looking into esp-idf source code I came to a conclusion that the reason for the RTC watchdog. However, RTC watchdog is enabled only for bootloader and disabled in the application code(look at attached sdkconfig file).

I suspect automatic light sleep mode because I see that RTC WTD is enabled before going to light sleep but I did find information on haw RTC watchdog is fed in light sleep. Light sleep code says that it enables watchdog in case exit from light sleep fails but did not describe how may it fail:

// Safety net: enable WDT in case exit from light sleep fails
    wdt_hal_context_t rtc_wdt_ctx = {.inst = WDT_RWDT, .rwdt_dev = &RTCCNTL};
    bool wdt_was_enabled = wdt_hal_is_enabled(&rtc_wdt_ctx);    // If WDT was enabled in the user code, then do not change it here.
    if (!wdt_was_enabled) {
        wdt_hal_init(&rtc_wdt_ctx, WDT_RWDT, 0, false);
        uint32_t stage_timeout_ticks = (uint32_t)(1000ULL * rtc_clk_slow_freq_get_hz() / 1000ULL);
        wdt_hal_write_protect_disable(&rtc_wdt_ctx);
        wdt_hal_config_stage(&rtc_wdt_ctx, WDT_STAGE0, stage_timeout_ticks, WDT_STAGE_ACTION_RESET_RTC);
        wdt_hal_enable(&rtc_wdt_ctx);
        wdt_hal_write_protect_enable(&rtc_wdt_ctx);
    }

Expected Behavior

ESP32 is not reset by RTC watchdog when it is disabled.

Actual Behavior

ESP32 is reset by RTC watchdog when it is disabled.

Steps to reproduce

Usually, I see this kind of reset within 12-24 hours of work. BLE and WiFi are enabled. Automatic light sleep and DFS are also enabled.

Code to reproduce this issue

Unfortunately, I cannot share the code (also, it requires custom HW and won't run on devkits).

Debug Logs


I (72871) BatteryDriver.c: Battery level is 100% (5.352V)
I (73431) ThingShadowService.c: Shadow update accepted
I (74231) ThingShadowService.c: Shadow update completed
I (75231) ThingShadowService.c: Updating the reported section
I (75231) ThingShadowService.c: Update Shadow
I (78031) ThingShadowService.c: Shadow update accepted
I (78841) ThingShadowService.c: Shadow update completed
I (85001) wagz_main.c: Timer event!
I (85001) wagz_main.c: Current date is Thu Dec 16 11:41:16 2021
I (85001) wagz_main.c: Assigned Pet ID: 3077
I (85001) KernelInterface.c: Heap size CAP 8bit 3913510
I (85011) KernelInterface.c: Heap size CAP 32bit 3917574
I (85011) KernelInterface.c: Heap size IRAM 4064
I (85021) KernelInterface.c: Heap size EXT RAM 3876395
I (85021) KernelInterface.c: EXT RAM largest free block 2097152
I (85031) KernelInterface.c: EXT RAM law water mark 3819571
I (85041) KernelInterface.c: Heap size INT RAM 41179
I (85041) KernelInterface.c: INT RAM largest free block 16384
I (85051) KernelInterface.c: INT RAM law water mark 27679
I (85061) KernelInterface.c: External RAM heap is valid.
I (85061) KernelInterface.c: Internal RAM heap is valid.
I (85071) PedometerService.c: Collected activity metrics: Sleep 1 min., Exercise 0 min., Normal 0 min, Steps 0
I (85191) GnssManager.c: $GPRMC,,V,,,,,,,,,,N,V*29
I (85321) GnssManager.c: $GPGGA,,,,,,0,,,,,,,,*66
I (87271) MotionSensor.c: Measured temperature is 30.70 Celsius
I (87271) TemperatureService.c: Ambient temperature (estimation): 59 Fahrenheit
I (87271) TemperatureService.c: The highest temperature is detected (59 Fahrenheit)
I (94311) DeviceStatus.c: Collecting status information...
I (691) psram: This chip is ESP32-D0WD
I (692) spiram: Found 64MBit SPI RAM device
I (692) spiram: SPI RAM mode: flash 40m sram 40m
I (694) spiram: PSRAM initialized, cache is in low/high (2-core) mode.
I (702) cpu_start: Pro cpu up.
I (705) cpu_start: Starting app cpu, entry point is 0x40081868
0x40081868: call_start_cpu1 at C:/Projects/esp-idf/components/esp_system/port/cpu_start.c:141

I (0) cpu_start: App cpu up.
I (1596) spiram: SPI SRAM memory test OK
I (1802) cpu_start: Pro cpu start user code
I (1802) cpu_start: cpu freq: 80000000
I (1802) cpu_start: Application information:
I (1805) cpu_start: Project name:     wagzfwos
I (1810) cpu_start: App version:      1.7.0
I (1815) cpu_start: Compile time:     Dec 16 2021 11:36:48
I (1821) cpu_start: ELF file SHA256:  2bc457eeb0d884fc...
I (1827) cpu_start: ESP-IDF:          v4.3
I (1832) heap_init: Initializing. RAM available for dynamic allocation:
I (1839) heap_init: At 3FFAFF10 len 000000F0 (0 KiB): DRAM
I (1845) heap_init: At 3FFB6388 len 00001C78 (7 KiB): DRAM
I (1852) heap_init: At 3FFB9A20 len 00004108 (16 KiB): DRAM
I (1858) heap_init: At 3FFC8400 len 00017C00 (95 KiB): DRAM
I (1864) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (1870) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (1877) heap_init: At 4009E7B8 len 00001848 (6 KiB): IRAM
I (1883) spiram: Adding pool of 3933K of external SPI memory to heap allocator
I (1893) spi_flash: detected chip: mxic
I (1896) spi_flash: flash io: qout
W (1900) spi_flash: Detected size(65536k) larger than the size in the binary image header(8192k). Using the size in the binary image header.
I (1924) pm: Frequency switching config: CPU_MAX: 80, APB_MAX: 80, APB_MIN: 40, Light sleep: DISABLED
I (1925) esp_core_dump_uart: Init core dump to UART
I (1931) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.
I (1941) spiram: Reserving pool of 32K of internal memory for DMA/internal allocations
I (1951) KernelInterface.c: Kernel preinit ...
I (1951) KernelInterface.c: Collar configuration = DOGCOLLAR
I (1961) KernelInterface.c: RTC WDT is off
I (1961) KernelInterface.c: Chip revision is 3
I (1971) gpio: GPIO[12]| InputEn: 0| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 1| Intr:0
I (1981) gpio: GPIO[21]| InputEn: 0| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 1| Intr:0
I (1991) gpio: GPIO[15]| InputEn: 0| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 1| Intr:0
I (2001) gpio: GPIO[19]| InputEn: 0| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 1| Intr:0
I (2011) gpio: GPIO[2]| InputEn: 0| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 1| Intr:0
I (2021) gpio: GPIO[18]| InputEn: 0| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 1| Intr:0
I (2031) gpio: GPIO[22]| InputEn: 0| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 1| Intr:0
I (2031) gpio: GPIO[14]| InputEn: 0| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 1| Intr:0
I (2041) gpio: GPIO[13]| InputEn: 1| OutputEn: 0| OpenDrain: 0| Pullup: 0| Pulldown: 0| Intr:0
I (2051) gpio: GPIO[34]| InputEn: 1| OutputEn: 0| OpenDrain: 0| Pullup: 0| Pulldown: 0| Intr:0
I (2061) gpio: GPIO[23]| InputEn: 1| OutputEn: 0| OpenDrain: 0| Pullup: 0| Pulldown: 1| Intr:0
I (2071) gpio: GPIO[37]| InputEn: 1| OutputEn: 0| OpenDrain: 0| Pullup: 0| Pulldown: 0| Intr:0
I (2081) gpio: GPIO[0]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 0| Pulldown: 0| Intr:0
I (2091) GpioDriver.c: Hold all GPIOs
I (2411) GpioDriver.c: GPIO Driver has been initialized
I (2411) BatteryDriver.c: Battery Driver initialized.
I (2411) I2cMaster.c: Begin initialization
I (2411) I2cMaster.c: Initialized on port0
I (2421) I2cMaster.c: SDA on pin: 4
I (2421) I2cMaster.c: SCL on pin: 26
I (2421) I2cMaster.c: Frequency:  50000
I (2551) BatteryService.c: Battery service is initialized
I (2551) BatteryManager.c: Battery Manager is initialized
I (2561) MicrophoneDriver.c: Chip ID is 0x0000
E (2561) MicrophoneDriver.c: Connection error
E (3571) MagnetometerDriver.c: Magnetometer IC was not detected
I (3571) BleManager.c: BLE manager initialization is done.
I (3571) BatteryDriver.c: Resetting internal filters...
I (4401) BatteryDriver.c: Battery level is 100% (5.374V)
I (4721) BatteryDriver.c: Battery level is 100% (5.375V)
I (5041) BatteryDriver.c: Battery level is 100% (5.375V)
I (5361) BatteryDriver.c: Battery level is 100% (5.374V)
I (5681) BatteryDriver.c: Battery level is 100% (5.375V)
I (5821) NvStorage.c: Default partition stats: Error code = (0), UsedEntries = (68), FreeEntries = (436), AllEntries = (504)
I (5821) NvStorage.c: Default NVS storage initialized
I (5851) NvStorage.c: common partition stats: Error code = (0), UsedEntries = (11), FreeEntries = (745), AllEntries = (756)
I (5851) NvStorage.c: common NVS partition is initialized
I (5891) NvStorage.c: settings partition stats: Error code = (0), UsedEntries = (186), FreeEntries = (570), AllEntries = (756)
I (5891) NvStorage.c: settings NVS partition is initialized
I (5901) FsStorage.c: Mounting FAT filesystem
W (5921) FsStorage.c: FAT FS: 976 kB total, 960 kB free
W (5921) FsStorage.c: //FEATURES.TXT: size  325
W (5921) FsStorage.c: //FENCE.TXT: size  2479
W (5931) FsStorage.c: //SETTINGS.TXT: size  820
W (5931) FsStorage.c: //GEOSET.TXT: size  166
I (5941) TimeConverter.c: Time is Sun Dec 16 11:36:48 2021

I (5941) HwInfoStorage.c: FW version is 1.7.0+2021-12-16T11-36-48Z
I (5951) ErrorCollector.cc: Created new errors collector (Cell)
I (5961) DnsCache.c: DNS cache initialized
I (6021) GSM_Director.c: GSM Initialization done
I (6021) GSM_Director.c: Skipping battery check for now
I (6021) GSM_Power.c: Modem power is in state 1: (1=on, 0=off)
W (6031) GSM_Power.c: GSM Modem is already powered-on
I (6031) GSM_Power.c: Modem power up sequence was completed.
I (6041) GSM_Director.c: Modem is already powered on. Resetting the modem.
I (6051) GSM_Director.c: Skipping battery check for now
I (6051) GSM_Power.c: Turning off cell modem
I (6061) wifi:wifi driver task: 3ffdb3c4, prio:23, stack:6656, core=0
I (6061) system_api: Base MAC address is not set
I (6071) system_api: read default base MAC address from EFUSE
I (6081) wifi:wifi firmware version: c7d0450
I (6081) wifi:wifi certification version: v7.0
I (6081) wifi:config NVS flash: disabled
I (6091) wifi:config nano formating: disabled
I (6091) wifi:Init data frame dynamic rx buffer num: 32
I (6101) wifi:Init management frame dynamic rx buffer num: 32
I (6101) wifi:Init management short buffer num: 32
I (6111) wifi:Init static tx buffer num: 16
I (6111) wifi:Init tx cache buffer num: 32
I (6111) wifi:Init static rx buffer size: 1600
I (6121) wifi:Init static rx buffer num: 6
I (6121) wifi:Init dynamic rx buffer num: 32
I (6131) wifi_init: rx ba win: 6
I (6131) wifi_init: tcpip mbox: 32
I (6131) wifi_init: udp mbox: 6
I (6141) wifi_init: tcp mbox: 6
I (6141) wifi_init: tcp tx win: 5744
I (6141) wifi_init: tcp rx win: 5744
I (6151) wifi_init: tcp mss: 1440
I (6151) wifi:Set ps type: 2
I (6161) WiFiManager.c: WiFi Station init done
I (6161) SubscriptionManager.c: Adding a new subscriber address: 0x400fc8f8 for manager 0x3ffe0ef8
0x400fc8f8: wiFiEventHandler at c:\projects\wagz-fwos-2\build/../lib/NetworkManager/NetworkManager.c:136
I (6171) SubscriptionManager.c: Adding a new subscriber address: 0x400fc854 for manager 0x3ffd5a00
0x400fc854: cellEventHandler at c:\projects\wagz-fwos-2\build/../lib/NetworkManager/NetworkManager.c:166
I (6181) NetworkManager.c: Network manager initialization is done.
I (6191) SubscriptionManager.c: Adding a new subscriber address: 0x400fb974 for manager 0x3ffe0ef8
0x400fb974: wiFiEventHandler at c:\projects\wagz-fwos-2\build/../lib/HttpsClient/HttpsClient.c:169
I (6201) SubscriptionManager.c: Adding a new subscriber address: 0x400fb894 for manager 0x3ffd5a00
0x400fb894: cellEventHandler at c:\projects\wagz-fwos-2\build/../lib/HttpsClient/HttpsClient.c:144
I (6421) BatteryDriver.c: Battery level is 100% (5.375V)
I (6741) BatteryDriver.c: Battery level is 100% (5.374V)
I (7061) BatteryDriver.c: Battery level is 100% (5.373V)
I (7381) BatteryDriver.c: Battery level is 100% (5.372V)
I (7701) BatteryDriver.c: Battery level is 100% (5.371V)
I (8021) BatteryDriver.c: Battery level is 100% (5.371V)
I (8341) BatteryDriver.c: Battery level is 100% (5.376V)
I (8581) GSM_Power.c: Turning on cell modem
I (8661) BatteryDriver.c: Battery level is 100% (5.377V)
I (8981) BatteryDriver.c: Battery level is 100% (5.378V)
I (9301) BatteryDriver.c: Battery level is 100% (5.379V)
I (9621) BatteryDriver.c: Battery level is 100% (5.380V)
I (9941) BatteryDriver.c: Battery level is 100% (5.381V)
I (10261) BatteryDriver.c: Battery level is 100% (5.383V)
I (10581) BatteryDriver.c: Battery level is 100% (5.384V)
I (10851) GSM_Power.c: Modem restart sequence was completed.
I (10851) uart: queue free spaces: 20
I (10851) GSM_IO.c: UART init is done.
I (10901) BatteryDriver.c: Battery level is 100% (5.383V)
I (10911) GSM_Director.c: GSM Modem has been restarted.
I (10911) GSM_States.c: Start Modem Handshake...
I (10911) GSM_Commands.c: Handshake command...
W (10971) GSM_IO.c: [text mode] Unhandled UART event (type:1, size:0)
W (10971) GSM_Dispatcher.c: Receive timeout 1000 ms occured (command: 'AT')
I (11221) BatteryDriver.c: Battery level is 100% (5.382V)
I (11541) GSM_Commands.c: Handshake command...
I (11541) BatteryDriver.c: Battery level is 100% (5.374V)
E (11601) GSM_IO.c: Null-terminator is found inside the text data (received length=6, while calculated string length is 0
00 41 54 0d 0d 0a
W (11601) GSM_Dispatcher.c: Receive timeout 1000 ms occured (command: 'AT')
I (11861) BatteryDriver.c: Battery level is 100% (5.370V)
I (12181) GSM_Commands.c: Handshake command...
I (12181) BatteryDriver.c: Battery level is 100% (5.366V)
W (12241) GSM_Dispatcher.c: Expecting the echo for the 'AT' AT-command, but has received 'OK' response
W (12241) GSM_Dispatcher.c: Expecting the echo for the 'AT' AT-command, but has received 'RDY' response
I (12311) GSM_Director.c: Handshake is complete.
I (12501) BatteryDriver.c: Battery level is 100% (5.361V)
I (12611) Kernel.c: The battery level is okay, starting the firmware
I (12611) KernelInterface.c: ESP reset reason is 7
I (12611) KernelInterface.c: RTC reset reason is 16
I (12611) KernelInterface.c: App level reset reason is 3
I (12621) EventService.c: Initialization is done.
I (12621) NvStorage.c: The requested key: "Deactivate" doesn't exist
I (12631) DeviceDataStorage.c: User data size: 238
I (12641) DeviceDataStorage.c: User data has been read from the settings storage
I (12641) DeviceDataStorage.c: User data is found in settings storage
I (12651) DeviceDataStorage.c: Build type is debug
I (12661) DeviceDataStorage.c: User JWT is eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VyIjp7ImlkIjo5MDUsImVtYWlsIjoibWlraXRhLm1pbmF1QHNvZnRlcS5jb20iLCJhcHAiOiJ3YWd6MiIsInJvbGUiOiJ1c2VyIn19.d7lCt-VaPDMVoS2OvOqkNH2aWBgy9igBespDQ6GHEmM
I (12681) DeviceDataStorage.c: User ID is 905
I (12681) DeviceDataStorage.c: Device activation URL is https://dev.api.wagz.com/users/905/devices
I (12691) DeviceDataStorage.c: Device get geofences URL is https://dev.api.wagz.com/users/905/geofences
I (12701) DeviceDataStorage.c: User data is valid
I (12711) DeviceDataStorage.c: Device data size: 5475
I (12771) DeviceDataStorage.c: Device data has been read from the settings storage
I (12771) DeviceDataStorage.c: Device data is found in settings storage
I (12771) DeviceDataStorage.c: Parsing device data
I (12791) DeviceDataStorage.c: Device ID is 6744
I (12801) DeviceDataStorage.c: Device deactivation URL is https://dev.api.wagz.com/users/905/devices/6744/actions/unlinkDevice
I (12801) DeviceDataStorage.c: Device heartbeat URL is https://dev.api.wagz.com/users/905/devices/6744/heartbeat
I (12811) DeviceDataStorage.c: Upload URL is https://dev.api.wagz.com/devices/logs/6744/upload
I (12821) BatteryDriver.c: Battery level is 100% (5.356V)
I (12831) BatteryManager.c: Battery Manager is started
I (12831) DeviceDataStorage.c: Device heartbeat URL is https://dev.api.wagz.com/users/905/devices/6744/heartbeat
I (12841) DeviceDataStorage.c: Device data is valid
I (13041) BatteryDriver.c: Battery level is 100% (5.350V)
I (13471) DeviceSettings.c: Device settings were read from a file
I (13471) KernelInterface.c: ESP reset reason is 7
I (13471) KernelInterface.c: RTC reset reason is 16
I (13481) KernelInterface.c: App level reset reason is 3
E (13481) OnboardingService.c: Assert or exception happened. Please, save the full log and send it to the FW team.
I (18321) GSM_States.c: Start Modem Configuration...
I (18321) GSM_Commands.c: Enable Error Report command...
W (18381) GSM_Dispatcher.c: Expecting the echo for the 'AT+CMEE=1' AT-command, but has received 'AT' response
W (18381) GSM_Dispatcher.c: Expecting the echo for the 'AT+CMEE=1' AT-command, but has received 'OK' response
W (18391) GSM_Dispatcher.c: Expecting the echo for the 'AT+CMEE=1' AT-command, but has received 'APP RDY' response
I (18461) GSM_Commands.c: Get the modem FW version command...
I (18581) GSM_States.c: The modem FW version is: BG77LAR02A04_01.005.01.005
I (18581) HwInfoStorage.c: Modem FW version is: BG77LAR02A04_01.005.01.005
I (18581) GSM_Commands.c: Cell Modem tests gpsOneXTRA Assistance function...
I (18711) GSM_Commands.c: Cell Modem tests injecting of gpsOneXTRA time...
I (18841) GSM_Commands.c: Cell Modem tests injecting of gpsOneXTRA data...
I (18971) GSM_Commands.c: Cell Modem query the status of gpsOneXTRA Assistance function...
I (19091) GSM_States.c: Cell modem GPS has one xtra feature enabled
I (19091) GSM_Commands.c: Query GNSS constellation configuration...
I (19151) GSM_Commands.c: +QGPSCFG: "gnssconfig",1
I (19151) GSM_Commands.c: Configured GNSS constellation is 1
I (19211) GSM_States.c: GNSS is already configured for GPS+GLONASS constellation
I (19211) GSM_Commands.c: Cell Modem enable IoT mode...
I (19271) GSM_Commands.c: eMTC IoT mode is enabled
I (19331) GSM_Commands.c: Disabling GNSS NMEA output port...
I (19451) GSM_Commands.c: Enabling NMEA acquisition using AT commands ...
I (19571) GSM_Director.c: GSM Modem configured.
I (19571) GSM_Commands.c: Get the modem FW version command...
I (19691) CellOta.c: Running LTE modem FW version is: BG77LAR02A04_01.005.01.005
I (19691) GSM_States.c: Check SIM Card...
I (19691) GSM_Commands.c: Check pin code command...

Other items if possible

sdkconfig.txt

stopthatcow commented 1 year ago

I am also seeing this occasionally on a ESP32-WROVER using automatic light sleep.

bearsh commented 1 year ago

this also happens in my application on some modules. I configure pm and enable light sleep in app_main then I start various tasks and also enable wifi in station mode connecting to a remote. it seems like when the wifi connection is established very fast, the reset occurs. if it takes some time for the wifi connection to get up, everything just works fine.

if I disable the wdt in https://github.com/espressif/esp-idf/blob/4c7d97e2bdbd26b1ad6adc6de8051888e1feec10/components/esp_hw_support/sleep_modes.c#L776-L779 by setting wdt_was_enabled = pdTRUE everything seems to work fine.

expresspotato commented 1 year ago

If its running on a custom board, you might want to check your matching circuit or configure CONFIG_ESP32_RTC_CLK_CAL_CYCLES to a higher value, say 8192. I had an issue with a matching circuit and hand solder job that was difficult to pin down. It's what led me to this thread with seemingly random RTCWDT_RTC_RESET errors.

Edit: After 59 minutes of being connected to ble and ad2p / hfp at the same time, with light sleep enabled, it still crashes here on v4.4.4. I'll have to wait for the next batch of prototypes. I feel like there could be an esp-idf related bug here but need to debug it further.

Edit: So in fact v4.4.4 cannot run BLE & Classic Bluetooth reliably and crash with RTCWDT_RTC_RESET even with CONFIG_ESP_INT_WDT=n set. It appears this bug is valid in a sense that, a crash with that message can still occur, even when disabled via sdkconfig / menuconfig, but it is not the root cause of the issues the op @MikitaMinau is experiencing with regards to BLE / WiFi. The code posted by @bearsh didn't solve the issues for me, and hence ticket #11315.