espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.91k stars 7.33k forks source link

RTC watchdog set timeout / feed not working / intermittent (IDFGH-2137) #4295

Closed shreyasbharath closed 5 years ago

shreyasbharath commented 5 years ago

Environment

Problem Description

RTC watchdog set timeout or feed is intermittent, i.e. if the RTC watchdog period is set to 120 seconds, it does not extend the RTC watchdog period from the default of 60 seconds which is set by the bootloader based on our project's SDK config.

Note: The RTC watchdog is being fed the internal RC oscillator at 150 KHz.

Expected Behavior

What we are expecting is for the RTC watchdog period to extend as per the configured period.

Actual Behavior

The watchdog time seems to remain at the default of 60 seconds even though the time is extended to 120 seconds when the application starts. This is causing premature watchdog resets during OTA updates.

Steps to reproduce

This has been very difficult to reproduce, it is very intermittent as we are only noticing issues with products in the field.

Code to reproduce this issue

https://gist.github.com/shreyasbharath/d9f91749001b5ddc8ad7f61634eff450

Debug Logs

None

Other items if possible

None

negativekelvin commented 5 years ago

So you have logs showing the reset reason and the timeout period was 60s? Do you check the register update with rtc_wdt_get_timeout?

igrr commented 5 years ago

@shreyasbharath using the functions defined in your code snippet, any advice how should we go about reproducing the issue? I second the suggestion from @negativekelvin about checking the value returned by rtc_wdt_get_timeout.

Logs would indeed be useful, since RTC_WDT reset reason is not necessarily the consequence of the RTC watchdog timing out during your program execution. Other possibilities exist: 1) the application crashes during the OTA, and the panic handler hangs or enters a reset loop; finally, RTC WDT resets the chip 2) the application successfully completes the OTA and then meets an RTC_WDT reset during the subsequent startup due to some flash related issue

shreyasbharath commented 5 years ago

Hi @igrr and @negativekelvin,

I'll try to gather some more logs. Before I do that, how trustworthy is the reset reason stored in the reset register? Can we trust it 100%?

Thanks for your insight.

shreyasbharath commented 5 years ago

@igrr How accurate is the internal RC 150 KHz oscillator?

Could this have any bearing on the RTC watchdog being wildly inaccurate?

igrr commented 5 years ago

As far as I know, it should be within 10% of the nominal frequency (which is somewhere between 150 and 160 kHz).

shreyasbharath commented 5 years ago

@igrr since we are using an older version of ESP IDF (and AWS FreeRTOS), is it possible that it is this bug, where the RTC watchdog was being fed incorrectly until this change? Have a look at aws_ota_pal.c in the below changeset.

Would the watchdog have been fed at all if the wrong bit was being set? According to the ESP datasheet, BIT 31 must be set to feed the watchdog but all this time we were feeding BIT 1.

https://github.com/aws/amazon-freertos/commit/9779f47e1a175997898976af3fc773ce5d77e8e5#diff-2d893ff85827c36ec19e7ecf1c6b52e8

igrr commented 5 years ago

Please note that IDF code doesn't touch the RTC watchdog after the application finishes loading, at which point the watchdog is disabled. Re-enabling the WDT and feeding it would happen either in a:FreeRTOS code (not part of this repository) or in your application code.

shreyasbharath commented 5 years ago

Okay noted, thanks.

There was a change made to the RTC watchdog feed implementation in this commit, which I suspect fixed a bug. Unfortunately, the AWS FreeRTOS guys also inherited this WDT feed bug by the looks of it.

Correct me if I am wrong?

https://github.com/espressif/esp-idf/commit/a8e46775c65f768c56d05a0fefe74a356450481c#diff-d1e2b1141ccb9a1a8ab568b284b8a878

igrr commented 5 years ago

Can you please point to the feed implementation prior to this commit? As far as I can tell, there was no code in IDF prior to this commit which would feed the watchdog. There was only "enable" with a certain timeout and "disable".

negativekelvin commented 5 years ago

I think he is saying because the old rtc_wdt_enable did not correctly feed the wdt the timeout might not be accurate

shreyasbharath commented 5 years ago

In that commit, sleep_modes.c had a function like so -

static void rtc_wdt_enable(int time_ms) 
{   
    WRITE_PERI_REG(RTC_CNTL_WDTWPROTECT_REG, RTC_CNTL_WDT_WKEY_VALUE);  
    WRITE_PERI_REG(RTC_CNTL_WDTFEED_REG, 1);    
    REG_SET_FIELD(RTC_CNTL_WDTCONFIG0_REG, RTC_CNTL_WDT_SYS_RESET_LENGTH, 7);   
    REG_SET_FIELD(RTC_CNTL_WDTCONFIG0_REG, RTC_CNTL_WDT_CPU_RESET_LENGTH, 7);   
    REG_SET_FIELD(RTC_CNTL_WDTCONFIG0_REG, RTC_CNTL_WDT_STG0, RTC_WDT_STG_SEL_RESET_RTC);   
    WRITE_PERI_REG(RTC_CNTL_WDTCONFIG1_REG, rtc_clk_slow_freq_get_hz() * time_ms / 1000);   
    SET_PERI_REG_MASK(RTC_CNTL_WDTCONFIG0_REG, RTC_CNTL_WDT_EN | RTC_CNTL_WDT_PAUSE_IN_SLP);    
    WRITE_PERI_REG(RTC_CNTL_WDTWPROTECT_REG, 0);    
}

The second line was writing to BIT0, but it's been fixed to write to BIT31 in the new code.

KonstantinKondrashov commented 5 years ago

Hi @shreyasbharath! There was an error with wrong using FEED: WRITE_PERI_REG(RTC_CNTL_WDTFEED_REG, 1) - wrong [0 bit] REG_SET_BIT(RTC_CNTL_WDTFEED_REG, RTC_CNTL_WDT_FEED) - right. [31 bit] It was fixed in 3.2 when was added the rtc_wdt API.

shreyasbharath commented 5 years ago

@KonstantinKondrashov Thanks.

It would be good to capture the fix in the changelog, as I couldn't find anything specific to this last time I checked.

negativekelvin commented 5 years ago

@KonstantinKondrashov fix should be added to 3.1? Not whole api but bit change.

https://github.com/espressif/esp-idf/blob/0fc8a5214f09a3ba9b5e7ed9fbffe38b4e0ca877/components/esp32/sleep_modes.c#L244

https://github.com/espressif/esp-idf/blob/0fc8a5214f09a3ba9b5e7ed9fbffe38b4e0ca877/components/esp32/sleep_modes.c#L256

shreyasbharath commented 5 years ago

Thanks for your help everyone, I'll report back if we see this issue repeating again.

shreyasbharath commented 5 years ago

I'll re-open issue if it crops up again, thanks again.