Open smarcosmarco opened 4 years ago
any news about it? tnx
Not implemented but I'll note that by default FRR's VRRP will wait for Master_Down_Interval
before coming up after first starting
[...] It would be important to have this function in FRR
Hi all, sorry for joining an old conversation, seems to be an interesting topic.
I think we don't need a preempt_delay
option, but just ensuring that a module complies to some requirements before joining the cluster (exiting from FAULT state): the preempt_delay
option just waits an arbitrary amount of time before moving from BACKUP to MASTER state after Master_Down_Interval
timeout (as per RFC Master_Down_Interval
is calculated as follow: (3 * Master_Adver_Interval) + Skew_time
), and this does not ensure that other modules will work correctly when MASTER state is gained; in some cases, this could result in a broken router joining the cluster. Also note that preempt_delay
is not an RFC standard and its use is limited in interop with other manufactures because it (re)calculates the Master_Down_Interval
(as shown above) with the following algorithm: ((3 * Master_Adver_Interval ) + Skew_time) + preempt_delay
.
I agree that in FRR it might be useful to implement some internal checks between its modules, with NETLINK (NETLINK_USERSOCK
) for example?; in that case, we can use it to listen for kernel events (like interface up/down) and even as an IPC user-space-to-user-space mechanism - with multicast; we can try to define a (standard) messaging subsystem in the FRR framework to ensure more granularity on those internal controls and also a hook system, where modules will update VRRP instances when an event occurs; FRR modules developers should implement this internal NETLINK messaging subsystem to ensure a minimal service availability for FRR VRRP, which might be difficult to achieve with external scripts; anyway, external scripts could be a great improvement due to its flexibility for some specific controls, but everyone must write their own code/script and can be difficult to maintain in an OS project.
Finally, maybe other applications implement external controls (scripts) because they are not a full-stack routing software but a protocol implementation, so it would be quite difficult for them to define such "standard"; on the other side, a full-stack routing software can implements an internal IPC between its core modules to ensure a more robust VRRP (high availability) service?
After a restart it is a good idea to wait for a defined time before VRRP returning to "master" state. This is to wait for the BGP, OSPF, etc sessions return UP.
Keepalived use a setting called "preempt_delay"
It would be important to have this function in FRR