Prevent Node-RED from Entering Safe Mode After Multiple Restarts

muenir commented 2 months ago

Description

Currently, after multiple restarts, Node-RED automatically enters safe mode with the message: “Node-RED restart loop detected. Restarting in safe mode.” This can result in extended downtimes, where flows are not running, particularly during off-peak hours such as nighttime. This is problematic as it can lead to critical services being offline for several hours.

Request: As an administrator, I would like the ability to configure a parameter (preferably as an environment variable) to prevent Node-RED instances from starting in safe mode after restarts. This will ensure that after a restart, flows remain active and the instance continues functioning without requiring manual intervention.

Expected Benefit: By introducing this configuration option, I will be able to minimize downtime and avoid lengthy periods where flows are not running, especially during unattended periods like overnight restarts. This ensures continuous service availability and improves overall system resilience.

Which customers would this be available to

Team + Enterprise Tiers (EE)

Have you provided an initial effort estimate for this issue?

I am not a FlowFuse team member

joepavitt commented 2 months ago

@muenir thanks for raising this. Just to check the details here, we put into safe mode, when we've detected multiple hangs/restart loops to prevent this continuing infinitely.

Whilst we can offer the configuration option here to disable that (or configure more detail on when that safe mode is enabled), I'm struggling to see the value in turning it off entirely as your application will just continue to crash/loop? Or do you expect it to auto-recover at some point?

muenir commented 2 months ago

I understand your point. In some cases, multiple restarts may happen due to temporary issues, such memory issues/leaks. It’s also possible that a specific part of the flow is triggered at certain times, causing a restart (e.g.,buggy custom or function node). Since we’re already detecting restarts, we can respond to them more effectively. However, the extended downtime of flows, especially overnight, is a significant concern. Again, this would be just an optional and even temporary flag that would be set ...

FlowFuse / flowfuse