Clarification on watchdog behavior

jjduhamel commented 3 years ago

After merging my code with the upstream changes since early august, I'm again facing an issue with my devices rebooting due to watchdog timeouts. Previously, I had configured the SDK as such:

$ git grep -i watchdog config/
config/Config.h:// ########### Watchdog ##########################################
config/Config.h://The watchdog will trigger a system reset if it is not feed in time
config/Config.h:#ifndef ACTIVATE_WATCHDOG
config/Config.h:#define ACTIVATE_WATCHDOG 1
config/Config.h:#ifndef FM_WATCHDOG_TIMEOUT
config/Config.h:#define FM_WATCHDOG_TIMEOUT (32768UL * 10)
config/Config.h:#ifndef FM_WATCHDOG_TIMEOUT_SAFE_BOOT
config/Config.h:#define FM_WATCHDOG_TIMEOUT_SAFE_BOOT (32768UL * 20) // Timeout in safe boot mode
config/Config.h:#ifndef ACTIVATE_WATCHDOG_SAFE_BOOT_MODE
config/Config.h:#define ACTIVATE_WATCHDOG_SAFE_BOOT_MODE 0

Now, I see these variables have been removed from the config at sdk/config_nrf52/sdk_config.h. I'm not finding much in the documentation about configuring the watchdog. I remember before reading something about the mesh requiring keep-alive messages if the watchdog interval was set above some value, which was the cause of our issue, but I can't find it now. Also, what's the current way to disable rebooting into safe mode?

mariusheil commented 3 years ago

Hi,

take a look at the current github_nrf52.cpp file, there are two functions that return the watchdog timeout. You can do a full text search through all project files and you will see that the watchdog is activated in the FruityHal in FruityHal::StartWatchdog.

We are now using a macro that is replaced by the function from the featureset. We had to do this as we are now also simulating the watchdog behavior in the cherrysim. And as the watchdog settings can be different for each featureset, we cannot work with the previous #defnine anymore.

The two functions are:

u32 GetWatchdogTimeout_github_nrf52()
{
    return 32768UL * 60 * 60 * 2;
}

u32 GetWatchdogTimeoutSafeBoot_github_nrf52()
{
    return 32768UL * 20UL;
}

and you should return 0 if you do not want the watchdog or safe boot mode enabled.

Also take a look at FruityHalNrf::ProcessAppEvents(). If the watchdog timeout is smaller than 60 seconds, it is constantly fed in the event loop, so the device will only reboot in case the event loop gets stuck for some reason. If the timeout is higher, the node expects to receive "keep_alive" messages.

Marius

mariusheil commented 3 years ago

Hi, I will close this as it was inactive for a while.

bluerange-io / bluerange-mesh

Clarification on watchdog behavior #152