bluerange-io / bluerange-mesh

BlueRange Mesh (formerly FruityMesh) - The first completely connection-based open source mesh on top of Bluetooth Low Energy (4.1/5.0 or higher)
https://bluerange.io/
Other
287 stars 109 forks source link

Clarification about watchdog behavior #142

Closed jjduhamel closed 4 years ago

jjduhamel commented 4 years ago

I'm debugging an issue in my mesh where the nodes are periodically rebooting and also wiping the flash memory. Checking the reboot reason for a node points at the watchdog. Thus, I'm inferring that the watchdog is causing the devices to reboot in safe mode, which is erasing the flash.

This documentation makes it seem that I need to have a mesh gateway present to feed the watchdog: https://github.com/mwaylabs/fruitymesh/blob/master/docs/opensource/modules/ROOT/pages/Features.adoc#watchdog-with-safe-boot-mode

I'm not sure whether one of the nodes is automatically configured as a gateway, that's an extra step I should implement, or my application does not require one. i.e. Should I disable the watchdog or if not should I enable safe boot?

mariusheil commented 4 years ago

Hi,

we have two different watchdog behaviours implemented. If you set the interval to 10 sec or lower ( I guess, please check the exact time first), it will be fed in the Eventlooper and will therefore only check that a node is still processing all incoming events. If a higher interval is configured, the nodes are only fed if there is a Meshgateway that repeatedly sends a keep_alive message. This is not done automatically. The idea is to be able to guarantee, that a node will always be able to connect to the mesh, if there is some implementation fault. This is probably not suitable for your use case if you do not have a gateway. Next, there is safe boot mode, which will boot the node in a state where no settings are loaded from flash. It should only be active for 20 seconds or so, I believe, before it reboots to normal mode again. Flash is not erased by this mode! This mode can be used if a faulty setting in flash prevents the node from booting.

If you do not need the mentioned functionality, I suggest that you configure the watchdog to e.g. 10 seconds and disable safe boot mode.

Marius

jjduhamel commented 4 years ago

Ok thanks. Just to clarify, I should change this line in config/Config.h:

define FM_WATCHDOG_TIMEOUT (32768UL 60 60 * 2)

to:

define FM_WATCHDOG_TIMEOUT (32768UL * 10)

mariusheil commented 4 years ago

Yes. And if you keep encountering issues. Check the reboot reason that is printed out at startup. You can also request the error log entries from the node and check the reboot reason. This should tell you the reboot cause, and would indicate e.g. a watchdog reboot.

jjduhamel commented 4 years ago

Thanks