Open pqarmitage opened 3 years ago
Hi Quentin - Could this issue also be the reason, that the track_src_ip
-option, is not working as expected?
Having the option set in my VRRP instance and removing the src ip using ip addr del 10.0.0.9/32 dev ens10
, makes the node transition to MASTER state, as you're also pointing out in point 3. Where as I would expect the instance to go into FAULT state.
@jeppech Could you please provide a copy of your configuration and the output of ip addr show ens10
.
@pqarmitage Sure! - Also, I'd might as well give some info about my setup:
Keepalived is installed on 2 VPSs, on Hetzner. One node is MASTER(node1), the other is BACKUP(node2). They're communicating using Hetzners private cloud network.
Both nodes are configured with a notify_master
and notify_backup
script. The notify_master
-script, assigns a floating ip, to the node, using Hetzner API. The notify_backup
-script, simply sends an email.
If I log on to node2
, and delete the LAN IP 10.0.0.9
, it logs:
Mar 09 14:53:32 chaos-assistant Keepalived_vrrp[745]: Deassigned address 10.0.0.9 from interface ens10
Mar 09 14:53:35 chaos-assistant Keepalived_vrrp[745]: (HAPROXY_LB) Receive advertisement timeout
Mar 09 14:53:35 chaos-assistant Keepalived_vrrp[745]: (HAPROXY_LB) Entering MASTER STATE
node1
is seemingly unaware, and logs nothing.
Re-assigning the LAN IP on node2
, it logs:
Mar 09 15:03:12 chaos-assistant Keepalived_vrrp[745]: Assigned address 10.0.0.9 for interface ens10
Mar 09 15:03:13 chaos-assistant Keepalived_vrrp[745]: (HAPROXY_LB) Master received advert from 10.0.0.8 with higher priority 102, ours 101
Mar 09 15:03:13 chaos-assistant Keepalived_vrrp[745]: (HAPROXY_LB) Entering BACKUP STATE
This leaves the nodes in kind of a split brain scenario. node2
, has been assigned the floating-ip, as it has invoked the failover script. node1
is not aware that node2
was briefly master, so it's still in MASTER state, but does not have the floating IP.
keepalived version
Keepalived v2.2.4 (08/21,2021)
Copyright(C) 2001-2021 Alexandre Cassen, <acassen@gmail.com>
Built with kernel headers for Linux 5.10.70
Running on Linux 5.10.0-11-amd64 #1 SMP Debian 5.10.92-2 (2022-02-28)
Distro: Debian GNU/Linux 11 (bullseye)
configure options: --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --enable-snmp --enable-sha1 --enable-snmp-rfcv2 --enable-snmp-rfcv3 --enable-dbus --enable-json --enable-bfd --enable-regex --with-init=systemd build_alias=x86_64-linux-gnu CFLAGS=-g -O2 -ffile-prefix-map=/build/keepalived-2.2.4=. -fstack-protector-strong -Wformat -Werror=format-security LDFLAGS=-Wl,-z,relro CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2
Config options: NFTABLES LVS REGEX VRRP VRRP_AUTH VRRP_VMAC JSON BFD OLD_CHKSUM_COMPAT SNMP_V3_FOR_V2 SNMP_VRRP SNMP_CHECKER SNMP_RFCV2 SNMP_RFCV3 DBUS INIT=systemd SYSTEMD_NOTIFY
System options: VSYSLOG MEMFD_CREATE IPV4_DEVCONF LIBNL3 RTA_ENCAP RTA_EXPIRES RTA_NEWDST RTA_PREF FRA_SUPPRESS_PREFIXLEN FRA_SUPPRESS_IFGROUP FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK RTEXT_FILTER_SKIP_STATS FRA_L3MDEV FRA_UID_RANGE RTAX_FASTOPEN_NO_COOKIE RTA_VIA FRA_PROTOCOL FRA_IP_PROTO FRA_SPORT_RANGE FRA_DPORT_RANGE RTA_TTL_PROPAGATE IFA_FLAGS LWTUNNEL_ENCAP_MPLS LWTUNNEL_ENCAP_ILA NET_LINUX_IF_H_COLLISION LIBIPVS_NETLINK IPVS_DEST_ATTR_ADDR_FAMILY IPVS_SYNCD_ATTRIBUTES IPVS_64BIT_STATS IPVS_TUN_TYPE IPVS_TUN_CSUM IPVS_TUN_GRE VRRP_IPVLAN IFLA_LINK_NETNSID GLOB_BRACE GLOB_ALTDIRFUNC INET6_ADDR_GEN_MODE VRF SO_MARK
ip addr show [node2]
3: ens10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UP group default qlen 1000
link/ether 86:00:00:01:d9:4e brd ff:ff:ff:ff:ff:ff
altname enp0s10
inet 10.0.0.9/32 scope global ens10
valid_lft forever preferred_lft forever
inet6 fe80::8400:ff:fe01:d94e/64 scope link
valid_lft forever preferred_lft forever
/etc/keepalived/keepalived.conf [node1]
global_defs {
enable_script_security
script_user hcloud-script
vrrp_version 2
dynamic_interfaces
# Delay the vrrp startup, so the private cloud network has time to be configured
vrrp_startup_delay 3
# Following settings are based on this topic https://groups.io/g/keepalived-users/topic/84312332#442
max_auto_priority 99
# Enable VRRP/Checker realtime scheduling, when priority is 99
vrrp_rt_priority 99
checker_rt_priority 99
vrrp_no_swap
checker_no_swap
# Keepalived is only allowed to block for 100ms
vrrp_rlimit_rttime 100000
}
interface_up_down_delays {
ens10 1
}
vrrp_track_process chk_haproxy {
process haproxy
quorum 1
weight 2
}
vrrp_instance HAPROXY_LB {
interface ens10
state MASTER
priority 100
advert_int 1
virtual_router_id 42
unicast_src_ip 10.0.0.8
unicast_peer {
10.0.0.9
}
authentication {
auth_type PASS
auth_pass 10hif9z
}
track_process {
chk_haproxy
}
# Assign floating ip, enable services, send email
notify_master /etc/keepalived/failover.sh root
# Disable services, send email
notify_backup /etc/keepalived/failback.sh root
}
/etc/keepalived/keepalived.conf [node2]
global_defs {
enable_script_security
script_user hcloud-script
vrrp_version 2
dynamic_interfaces
# Delay the vrrp startup, so the private cloud network has time to be configured
vrrp_startup_delay 3
# Following settings are based on this topic https://groups.io/g/keepalived-users/topic/84312332#442
max_auto_priority 99
# Enable VRRP/Checker realtime scheduling, when priority is 99
vrrp_rt_priority 99
checker_rt_priority 99
vrrp_no_swap
checker_no_swap
# Keepalived is only allowed to block for 100ms
vrrp_rlimit_rttime 100000
}
interface_up_down_delays {
ens10 1
}
vrrp_track_process chk_haproxy {
process haproxy
quorum 1
weight 2
}
vrrp_instance HAPROXY_LB {
interface ens10
state BACKUP
priority 99
advert_int 1
virtual_router_id 42
unicast_src_ip 10.0.0.9
unicast_peer {
10.0.0.8
}
authentication {
auth_type PASS
auth_pass 10hif9z
}
track_src_ip
track_process {
chk_haproxy
}
# Assign floating ip, enable services, send email
notify_master /etc/keepalived/failover.sh root
# Disable services, send email
notify_backup /etc/keepalived/failback.sh root
}
Describe the bug
Setting an interface up without the unicast_src configured on the interface means that keepalived cannot bind to the unicast_src and so doesn't receive adverts from other nodes; however the VRRP instance is not put into fault state, and so it transitions to master, even if there is a higher priority master. Subsequently adding the unicast_src IP makes it work again.
Deleting an interface and subsequently recreating it causes a bind error which is not detected. The VRRP instance then does not work again
Deleting the unicast_src ip address does not cause the VRRP instance to go to fault state
To Reproduce
Expected behavior The VRRP instance stays in fault state if the unicast_src is not configured.
It doesn't attempt to bind to the unicast_src address until it is configured.
Keepalived version
Distro (please complete the following information):
The problem occurs regardless of distro, version or architecture
Details of any containerisation or hosted service (e.g. AWS) None, but it wouldn't make any difference.
Configuration file:
Notify and track scripts n/a
System Log entries 1.
Other node starts logging:
Tue Feb 11:57:53.638253974 2021: (VI_1) Received advert from 192.168.54.4 with lower priority 100, ours 101, forcing new election
2.
3.
and MASTER start logging
Did keepalived coredump? No coredump
Additional context For point 1., the following works without an error: