Implement a watchdog process that monitors the critical runtime components of BGW and ensure a consistent inactive state for a failed BGW instance.
Context of new functionality
The L2GW Agent is responsible for switching to a substitute BGW instance in case of failure. However, it only detects failures based on disconnection from the OVSDB.
The responsibility of the watchdog is to ensure L2GW Agent will experience an OVSDB disconnection, even when the failed component is not OVSDB.
Design Guideline
The general direction is to use a STONITH technique, by just killing all surviving processes in case of a partial failure, and then reboot the host to complete its cleanup and availability as a substitute for the next failing BGW.
Overview of new functionality
Implement a watchdog process that monitors the critical runtime components of BGW and ensure a consistent inactive state for a failed BGW instance.
Context of new functionality
The L2GW Agent is responsible for switching to a substitute BGW instance in case of failure. However, it only detects failures based on disconnection from the OVSDB. The responsibility of the watchdog is to ensure L2GW Agent will experience an OVSDB disconnection, even when the failed component is not OVSDB.
Design Guideline
The general direction is to use a STONITH technique, by just killing all surviving processes in case of a partial failure, and then reboot the host to complete its cleanup and availability as a substitute for the next failing BGW.