FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.19k stars 1.23k forks source link

VRRP "preempt_delay" implementation #5526

Open smarcosmarco opened 4 years ago

smarcosmarco commented 4 years ago

After a restart it is a good idea to wait for a defined time before VRRP returning to "master" state. This is to wait for the BGP, OSPF, etc sessions return UP.

Keepalived use a setting called "preempt_delay"

# Seconds after startup or seeing a lower priority master until preemption
           # (if not disabled by "nopreempt").
           # Range: 0 (default) to 1000 (e.g. 4.12)
           # NOTE: For this to work, the initial state of this
           # entry must be BACKUP.
           preempt_delay 300    # waits 5 minutes

It would be important to have this function in FRR

ctosae commented 3 years ago

any news about it? tnx

qlyoung commented 3 years ago

Not implemented but I'll note that by default FRR's VRRP will wait for Master_Down_Interval before coming up after first starting

nser77 commented 1 year ago

[...] It would be important to have this function in FRR

Hi all, sorry for joining an old conversation, seems to be an interesting topic.

I think we don't need a preempt_delay option, but just ensuring that a module complies to some requirements before joining the cluster (exiting from FAULT state): the preempt_delay option just waits an arbitrary amount of time before moving from BACKUP to MASTER state after Master_Down_Interval timeout (as per RFC Master_Down_Interval is calculated as follow: (3 * Master_Adver_Interval) + Skew_time), and this does not ensure that other modules will work correctly when MASTER state is gained; in some cases, this could result in a broken router joining the cluster. Also note that preempt_delay is not an RFC standard and its use is limited in interop with other manufactures because it (re)calculates the Master_Down_Interval (as shown above) with the following algorithm: ((3 * Master_Adver_Interval ) + Skew_time) + preempt_delay.

I agree that in FRR it might be useful to implement some internal checks between its modules, with NETLINK (NETLINK_USERSOCK) for example?; in that case, we can use it to listen for kernel events (like interface up/down) and even as an IPC user-space-to-user-space mechanism - with multicast; we can try to define a (standard) messaging subsystem in the FRR framework to ensure more granularity on those internal controls and also a hook system, where modules will update VRRP instances when an event occurs; FRR modules developers should implement this internal NETLINK messaging subsystem to ensure a minimal service availability for FRR VRRP, which might be difficult to achieve with external scripts; anyway, external scripts could be a great improvement due to its flexibility for some specific controls, but everyone must write their own code/script and can be difficult to maintain in an OS project.

Finally, maybe other applications implement external controls (scripts) because they are not a full-stack routing software but a protocol implementation, so it would be quite difficult for them to define such "standard"; on the other side, a full-stack routing software can implements an internal IPC between its core modules to ensure a more robust VRRP (high availability) service?