Closed shreyasbharath closed 5 years ago
So you have logs showing the reset reason and the timeout period was 60s? Do you check the register update with rtc_wdt_get_timeout?
@shreyasbharath using the functions defined in your code snippet, any advice how should we go about reproducing the issue? I second the suggestion from @negativekelvin about checking the value returned by rtc_wdt_get_timeout
.
Logs would indeed be useful, since RTC_WDT reset reason is not necessarily the consequence of the RTC watchdog timing out during your program execution. Other possibilities exist: 1) the application crashes during the OTA, and the panic handler hangs or enters a reset loop; finally, RTC WDT resets the chip 2) the application successfully completes the OTA and then meets an RTC_WDT reset during the subsequent startup due to some flash related issue
Hi @igrr and @negativekelvin,
I'll try to gather some more logs. Before I do that, how trustworthy is the reset reason stored in the reset register? Can we trust it 100%?
Thanks for your insight.
@igrr How accurate is the internal RC 150 KHz oscillator?
Could this have any bearing on the RTC watchdog being wildly inaccurate?
As far as I know, it should be within 10% of the nominal frequency (which is somewhere between 150 and 160 kHz).
@igrr since we are using an older version of ESP IDF (and AWS FreeRTOS), is it possible that it is this bug, where the RTC watchdog was being fed
incorrectly until this change? Have a look at aws_ota_pal.c
in the below changeset.
Would the watchdog have been fed at all if the wrong bit was being set? According to the ESP datasheet, BIT 31 must be set to feed
the watchdog but all this time we were feeding BIT 1.
Please note that IDF code doesn't touch the RTC watchdog after the application finishes loading, at which point the watchdog is disabled. Re-enabling the WDT and feeding it would happen either in a:FreeRTOS code (not part of this repository) or in your application code.
Okay noted, thanks.
There was a change made to the RTC watchdog feed implementation in this commit, which I suspect fixed a bug. Unfortunately, the AWS FreeRTOS guys also inherited this WDT feed bug by the looks of it.
Correct me if I am wrong?
Can you please point to the feed implementation prior to this commit? As far as I can tell, there was no code in IDF prior to this commit which would feed the watchdog. There was only "enable" with a certain timeout and "disable".
I think he is saying because the old rtc_wdt_enable did not correctly feed the wdt the timeout might not be accurate
In that commit, sleep_modes.c
had a function like so -
static void rtc_wdt_enable(int time_ms)
{
WRITE_PERI_REG(RTC_CNTL_WDTWPROTECT_REG, RTC_CNTL_WDT_WKEY_VALUE);
WRITE_PERI_REG(RTC_CNTL_WDTFEED_REG, 1);
REG_SET_FIELD(RTC_CNTL_WDTCONFIG0_REG, RTC_CNTL_WDT_SYS_RESET_LENGTH, 7);
REG_SET_FIELD(RTC_CNTL_WDTCONFIG0_REG, RTC_CNTL_WDT_CPU_RESET_LENGTH, 7);
REG_SET_FIELD(RTC_CNTL_WDTCONFIG0_REG, RTC_CNTL_WDT_STG0, RTC_WDT_STG_SEL_RESET_RTC);
WRITE_PERI_REG(RTC_CNTL_WDTCONFIG1_REG, rtc_clk_slow_freq_get_hz() * time_ms / 1000);
SET_PERI_REG_MASK(RTC_CNTL_WDTCONFIG0_REG, RTC_CNTL_WDT_EN | RTC_CNTL_WDT_PAUSE_IN_SLP);
WRITE_PERI_REG(RTC_CNTL_WDTWPROTECT_REG, 0);
}
The second line was writing to BIT0, but it's been fixed to write to BIT31 in the new code.
Hi @shreyasbharath! There was an error with wrong using FEED: WRITE_PERI_REG(RTC_CNTL_WDTFEED_REG, 1) - wrong [0 bit] REG_SET_BIT(RTC_CNTL_WDTFEED_REG, RTC_CNTL_WDT_FEED) - right. [31 bit] It was fixed in 3.2 when was added the rtc_wdt API.
@KonstantinKondrashov Thanks.
It would be good to capture the fix in the changelog, as I couldn't find anything specific to this last time I checked.
@KonstantinKondrashov fix should be added to 3.1? Not whole api but bit change.
Thanks for your help everyone, I'll report back if we see this issue repeating again.
I'll re-open issue if it crops up again, thanks again.
Environment
Problem Description
RTC watchdog set timeout or feed is intermittent, i.e. if the RTC watchdog period is set to 120 seconds, it does not extend the RTC watchdog period from the default of 60 seconds which is set by the bootloader based on our project's SDK config.
Note: The RTC watchdog is being fed the internal RC oscillator at 150 KHz.
Expected Behavior
What we are expecting is for the RTC watchdog period to extend as per the configured period.
Actual Behavior
The watchdog time seems to remain at the default of 60 seconds even though the time is extended to 120 seconds when the application starts. This is causing premature watchdog resets during OTA updates.
Steps to reproduce
This has been very difficult to reproduce, it is very intermittent as we are only noticing issues with products in the field.
Code to reproduce this issue
https://gist.github.com/shreyasbharath/d9f91749001b5ddc8ad7f61634eff450
Debug Logs
None
Other items if possible
None