acassen / keepalived

Keepalived
https://www.keepalived.org
GNU General Public License v2.0
4.01k stars 736 forks source link

unable to recover from split brain problem #2387

Closed stanluk closed 5 months ago

stanluk commented 9 months ago

Describe the bug Rarely we are hitting issue on our setup cluster, when two keepalived instances cannot recover from split brain. The first guess was that network setup do not work correctly, however tcpdump shows the vrrp packets are sent/recieve on both machines. Even though one keepalived doesn't transition from MASTER to BACKUP.

To Reproduce The issue is mostly reproducible when we combine two factors that may happen on our setup:

The typical reproduction scenario contains:

  1. Two machines A and B start booting keepalived at the same time
  2. Keepalived A and B both enters backup state
  3. Keepalived B gets configuration update and reloads
  4. A and B timeout fires, both enter MASTER state
  5. A receives B advert, since it has higher priority is forces election
  6. B seems to not receive anything, however the tcpdump shows advertisement from A

Any ideas how to further debug this issue will be appreciated.

Expected behavior Lower priority instance should transition to backup

Keepalived version

Keepalived v2.2.8 (04/04,2023), git commit v2.2.7-154-g292b299e+

Details of any containerisation or hosted service (e.g. AWS) Self-hosted k8s.

Configuration file:

global_defs {
  vrrp_startup_delay 10
  vrrp_garp_interval 0.001
  vrrp_version 3
  vrrp_garp_master_refresh 30
  vrrp_garp_lower_prio_repeat 5
  vrrp_higher_prio_send_advert true
  script_user root root
  notify_fifo /tmp/notify_fifo
  notify_fifo_script /tmp/notify.sh
}

vrrp_script check_script_one {
  script "/myscript.sh"
  interval 1
  timeout 1
  rise 2
  fall 3
}

vrrp_instance VI_1 {
  advert_int 0.4
  interface eth0
  state BACKUP
  unicast_src_ip 10.131.0.178
  unicast_peer {
     10.128.2.123
  }
  virtual_router_id 1
  priority 200
  nopreempt
  virtual_ipaddress {
    192.168.120.2/24 dev net2
    10.10.10.2/24 dev net1
  }
  track_script {
    check_script_one
  }
  track_interface {
      net1
      net2
  }
}
===========
vrrp_instance VI_1 {
  advert_int 0.4
  interface eth0
  state BACKUP
  unicast_src_ip 10.128.2.123
  unicast_peer {
     10.131.0.178
  }
  virtual_router_id 1
  priority 50
  nopreempt
  virtual_ipaddress {
    192.168.120.2/24 dev net2
    10.10.10.2/24 dev net1
  }
  track_script {
    check_script_one
  }
  track_interface {
      net1
      net2
  }
}

Notify and track scripts

If any notify or track scripts are in use, please provide copies of them

System Log entries $hostname0:

Wed Feb 21 21:57:11 2024: Starting Keepalived v2.2.8 (04/04,2023), git commit v2.2.7-154-g292b299e+
Wed Feb 21 21:57:11 2024: WARNING - keepalived was built for newer Linux 5.14.0, running on Linux 4.18.0-305.28.1.el8_4.x86_64 #1 SMP Mon Nov 8 07:45:47 EST 2021
Wed Feb 21 21:57:11 2024: Command line: '/usr/sbin/keepalived' '-f' '/tmp/keepalived.conf' '--dont-fork' '--log-console' '-D'
Wed Feb 21 21:57:11 2024: Opening file '/tmp/keepalived.conf'.
Wed Feb 21 21:57:11 2024: Configuration file /tmp/keepalived.conf
Wed Feb 21 21:57:11 2024: NOTICE: setting config option max_auto_priority should result in better keepalived performance
Wed Feb 21 21:57:11 2024: Starting VRRP child process, pid=154
Wed Feb 21 21:57:11 2024: Registering Kernel netlink reflector
Wed Feb 21 21:57:11 2024: Registering Kernel netlink command channel
Wed Feb 21 21:57:11 2024: Delaying startup for 10 seconds
Wed Feb 21 21:57:11 2024: SECURITY VIOLATION - scripts are being executed but script_security not enabled.
Wed Feb 21 21:57:11 2024: Assigned address 10.131.0.178 for interface eth0
Wed Feb 21 21:57:11 2024: Assigned address fe80::858:aff:fe83:b2 for interface eth0
Wed Feb 21 21:57:11 2024: Registering gratuitous ARP shared channel
Wed Feb 21 21:57:11 2024: (VI_1) removing VIPs.
Wed Feb 21 21:57:11 2024: Startup complete
Wed Feb 21 21:57:11 2024: VRRP sockpool: [ifindex(  3), family(IPv4), proto(112), fd(13,14) , unicast, address(10.131.0.178)]
Wed Feb 21 21:57:11 2024: VRRP_Script(check_script_one) succeeded
Wed Feb 21 21:57:11 2024: (VI_1) Entering BACKUP STATE
Wed Feb 21 21:57:11 2024: VI_1: sending gratuitous ARP for 10.131.0.178
Wed Feb 21 21:57:11 2024: Sending gratuitous ARP on eth0 for 10.131.0.178
debug state: BACKUP
Wed Feb 21 21:57:21 2024: Delayed start completed
Wed Feb 21 21:57:23 2024: (VI_1) Receive advertisement timeout
Wed Feb 21 21:57:23 2024: (VI_1) Entering MASTER STATE
Wed Feb 21 21:57:23 2024: (VI_1) setting VIPs.
Wed Feb 21 21:57:23 2024: (VI_1) Sending/queueing gratuitous ARPs on net2 for 192.168.120.2
Wed Feb 21 21:57:23 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 21:57:23 2024: (VI_1) Sending/queueing gratuitous ARPs on net1 for 10.10.10.2
Wed Feb 21 21:57:23 2024: Sending gratuitous ARP on net1 for 10.10.10.2
Wed Feb 21 21:57:23 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 21:57:23 2024: Sending gratuitous ARP on net1 for 10.10.10.2
debug state: MASTER
Wed Feb 21 21:57:23 2024: (VI_1) Received advert from 10.128.2.123 with lower priority 50, ours 200, forcing new election
Wed Feb 21 21:57:23 2024: (VI_1) Sending/queueing gratuitous ARPs on net2 for 192.168.120.2
Wed Feb 21 21:57:23 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 21:57:23 2024: (VI_1) Sending/queueing gratuitous ARPs on net1 for 10.10.10.2
Wed Feb 21 21:57:23 2024: Sending gratuitous ARP on net1 for 10.10.10.2
Wed Feb 21 21:57:23 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 21:57:23 2024: Sending gratuitous ARP on net1 for 10.10.10.2
debug state: MASTER_RX_LOWER_PRI
Wed Feb 21 21:57:23 2024: (VI_1) Received advert from 10.128.2.123 with lower priority 50, ours 200, forcing new election
Wed Feb 21 21:57:23 2024: (VI_1) Sending/queueing gratuitous ARPs on net2 for 192.168.120.2
Wed Feb 21 21:57:23 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 21:57:23 2024: (VI_1) Sending/queueing gratuitous ARPs on net1 for 10.10.10.2
Wed Feb 21 21:57:23 2024: Sending gratuitous ARP on net1 for 10.10.10.2
Wed Feb 21 21:57:23 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 21:57:23 2024: Sending gratuitous ARP on net1 for 10.10.10.2
debug state: MASTER_RX_LOWER_PRI
Wed Feb 21 21:57:24 2024: (VI_1) Received advert from 10.128.2.123 with lower priority 50, ours 200, forcing new election
Wed Feb 21 21:57:24 2024: (VI_1) Sending/queueing gratuitous ARPs on net2 for 192.168.120.2
Wed Feb 21 21:57:24 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 21:57:24 2024: (VI_1) Sending/queueing gratuitous ARPs on net1 for 10.10.10.2
Wed Feb 21 21:57:24 2024: Sending gratuitous ARP on net1 for 10.10.10.2
Wed Feb 21 21:57:24 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 21:57:24 2024: Sending gratuitous ARP on net1 for 10.10.10.2
debug state: MASTER_RX_LOWER_PRI
Wed Feb 21 21:57:24 2024: (VI_1) Received advert from 10.128.2.123 with lower priority 50, ours 200, forcing new election
Wed Feb 21 21:57:24 2024: (VI_1) Sending/queueing gratuitous ARPs on net2 for 192.168.120.2
Wed Feb 21 21:57:24 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 21:57:24 2024: (VI_1) Sending/queueing gratuitous ARPs on net1 for 10.10.10.2
Wed Feb 21 21:57:24 2024: Sending gratuitous ARP on net1 for 10.10.10.2
Wed Feb 21 21:57:24 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 21:57:24 2024: Sending gratuitous ARP on net1 for 10.10.10.2
debug state: MASTER_RX_LOWER_PRI
Wed Feb 21 21:57:24 2024: (VI_1) Received advert from 10.128.2.123 with lower priority 50, ours 200, forcing new election
Wed Feb 21 21:57:24 2024: (VI_1) Sending/queueing gratuitous ARPs on net2 for 192.168.120.2
Wed Feb 21 21:57:24 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 21:57:24 2024: (VI_1) Sending/queueing gratuitous ARPs on net1 for 10.10.10.2
Wed Feb 21 21:57:24 2024: Sending gratuitous ARP on net1 for 10.10.10.2
Wed Feb 21 21:57:24 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 21:57:24 2024: Sending gratuitous ARP on net1 for 10.10.10.2
debug state: MASTER_RX_LOWER_PRI
Wed Feb 21 21:57:25 2024: (VI_1) Received advert from 10.128.2.123 with lower priority 50, ours 200, forcing new election
Wed Feb 21 21:57:25 2024: (VI_1) Sending/queueing gratuitous ARPs on net2 for 192.168.120.2
Wed Feb 21 21:57:25 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 21:57:25 2024: (VI_1) Sending/queueing gratuitous ARPs on net1 for 10.10.10.2
Wed Feb 21 21:57:25 2024: Sending gratuitous ARP on net1 for 10.10.10.2
Wed Feb 21 21:57:25 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 21:57:25 2024: Sending gratuitous ARP on net1 for 10.10.10.2
debug state: MASTER_RX_LOWER_PRI
.....
Wed Feb 21 22:01:25 2024: (VI_1) Sending/queueing gratuitous ARPs on net2 for 192.168.120.2
Wed Feb 21 22:01:25 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 22:01:25 2024: (VI_1) Sending/queueing gratuitous ARPs on net1 for 10.10.10.2
Wed Feb 21 22:01:25 2024: Sending gratuitous ARP on net1 for 10.10.10.2
Wed Feb 21 22:01:25 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 22:01:25 2024: Sending gratuitous ARP on net1 for 10.10.10.2
Wed Feb 21 22:01:25 2024: (VI_1) Received advert from 10.128.2.123 with lower priority 50, ours 200, forcing new election
Wed Feb 21 22:01:25 2024: (VI_1) Sending/queueing gratuitous ARPs on net2 for 192.168.120.2
Wed Feb 21 22:01:25 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 22:01:25 2024: (VI_1) Sending/queueing gratuitous ARPs on net1 for 10.10.10.2
Wed Feb 21 22:01:25 2024: Sending gratuitous ARP on net1 for 10.10.10.2
Wed Feb 21 22:01:25 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 22:01:25 2024: Sending gratuitous ARP on net1 for 10.10.10.2
debug state: MASTER_RX_LOWER_PRI
Wed Feb 21 22:01:25 2024: (VI_1) Sending/queueing gratuitous ARPs on net2 for 192.168.120.2
Wed Feb 21 22:01:25 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 22:01:25 2024: (VI_1) Sending/queueing gratuitous ARPs on net1 for 10.10.10.2
Wed Feb 21 22:01:25 2024: Sending gratuitous ARP on net1 for 10.10.10.2
Wed Feb 21 22:01:25 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 22:01:25 2024: Sending gratuitous ARP on net1 for 10.10.10.2

$hostname1

Wed Feb 21 21:57:11 2024: Starting Keepalived v2.2.8 (04/04,2023), git commit v2.2.7-154-g292b299e+
Wed Feb 21 21:57:11 2024: WARNING - keepalived was built for newer Linux 5.14.0, running on Linux 4.18.0-305.28.1.el8_4.x86_64 #1 SMP Mon Nov 8 07:45:47 EST 2021
Wed Feb 21 21:57:11 2024: Command line: '/usr/sbin/keepalived' '-f' '/tmp/keepalived.conf' '--dont-fork' '--log-console' '-D'
Wed Feb 21 21:57:11 2024: Opening file '/tmp/keepalived.conf'.
Wed Feb 21 21:57:11 2024: Configuration file /tmp/keepalived.conf
Wed Feb 21 21:57:11 2024: NOTICE: setting config option max_auto_priority should result in better keepalived performance
Wed Feb 21 21:57:11 2024: Starting VRRP child process, pid=101
Wed Feb 21 21:57:11 2024: Registering Kernel netlink reflector
Wed Feb 21 21:57:11 2024: Registering Kernel netlink command channel
Wed Feb 21 21:57:11 2024: Delaying startup for 10 seconds
Wed Feb 21 21:57:11 2024: SECURITY VIOLATION - scripts are being executed but script_security not enabled.
Wed Feb 21 21:57:11 2024: Assigned address 10.128.2.123 for interface eth0
Wed Feb 21 21:57:11 2024: Assigned address fe80::858:aff:fe80:27b for interface eth0
Wed Feb 21 21:57:11 2024: Registering gratuitous ARP shared channel
Wed Feb 21 21:57:11 2024: (VI_1) removing VIPs.
Wed Feb 21 21:57:11 2024: Startup complete
Wed Feb 21 21:57:11 2024: VRRP sockpool: [ifindex(  3), family(IPv4), proto(112), fd(13,14) , unicast, address(10.128.2.123)]
Wed Feb 21 21:57:11 2024: VRRP_Script(check_script_one) succeeded
Wed Feb 21 21:57:11 2024: (VI_1) Entering BACKUP STATE
Wed Feb 21 21:57:11 2024: VI_1: sending gratuitous ARP for 10.128.2.123
Wed Feb 21 21:57:11 2024: Sending gratuitous ARP on eth0 for 10.128.2.123
debug state: BACKUP
Wed Feb 21 21:57:12 2024: Reloading ...
Wed Feb 21 21:57:12 2024: Opening file '/tmp/keepalived.conf'.
Wed Feb 21 21:57:12 2024: Configuration file /tmp/keepalived.conf
Wed Feb 21 21:57:12 2024: Reloading
Wed Feb 21 21:57:12 2024: VRRP_Script(check_script_one) considered successful on reload
Wed Feb 21 21:57:12 2024: SECURITY VIOLATION - scripts are being executed but script_security not enabled.
Wed Feb 21 21:57:12 2024: Assigned address 10.128.2.123 for interface eth0
Wed Feb 21 21:57:12 2024: Assigned address fe80::858:aff:fe80:27b for interface eth0
Wed Feb 21 21:57:12 2024: Reload complete
Wed Feb 21 21:57:12 2024: VRRP sockpool: [ifindex(  3), family(IPv4), proto(112), fd(13,14) , unicast, address(10.128.2.123)]
Wed Feb 21 21:57:23 2024: (VI_1) Receive advertisement timeout
Wed Feb 21 21:57:23 2024: (VI_1) Entering MASTER STATE
Wed Feb 21 21:57:23 2024: (VI_1) setting VIPs.
Wed Feb 21 21:57:23 2024: (VI_1) Sending/queueing gratuitous ARPs on net2 for 192.168.120.2
Wed Feb 21 21:57:23 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 21:57:23 2024: (VI_1) Sending/queueing gratuitous ARPs on net1 for 10.10.10.2
Wed Feb 21 21:57:23 2024: Sending gratuitous ARP on net1 for 10.10.10.2
Wed Feb 21 21:57:23 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 21:57:23 2024: Sending gratuitous ARP on net1 for 10.10.10.2
debug state: MASTER
Wed Feb 21 21:57:28 2024: (VI_1) Sending/queueing gratuitous ARPs on net2 for 192.168.120.2
Wed Feb 21 21:57:28 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 21:57:28 2024: (VI_1) Sending/queueing gratuitous ARPs on net1 for 10.10.10.2
Wed Feb 21 21:57:28 2024: Sending gratuitous ARP on net1 for 10.10.10.2
Wed Feb 21 21:57:28 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 21:57:28 2024: Sending gratuitous ARP on net1 for 10.10.10.2
Wed Feb 21 21:57:53 2024: (VI_1) Sending/queueing gratuitous ARPs on net2 for 192.168.120.2
Wed Feb 21 21:57:53 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 21:57:53 2024: (VI_1) Sending/queueing gratuitous ARPs on net1 for 10.10.10.2
Wed Feb 21 21:57:53 2024: Sending gratuitous ARP on net1 for 10.10.10.2
Wed Feb 21 21:58:23 2024: (VI_1) Sending/queueing gratuitous ARPs on net2 for 192.168.120.2
Wed Feb 21 21:58:23 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 21:58:23 2024: (VI_1) Sending/queueing gratuitous ARPs on net1 for 10.10.10.2
Wed Feb 21 21:58:23 2024: Sending gratuitous ARP on net1 for 10.10.10.2
Wed Feb 21 21:58:53 2024: (VI_1) Sending/queueing gratuitous ARPs on net2 for 192.168.120.2
Wed Feb 21 21:58:53 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 21:58:53 2024: (VI_1) Sending/queueing gratuitous ARPs on net1 for 10.10.10.2
Wed Feb 21 21:58:53 2024: Sending gratuitous ARP on net1 for 10.10.10.2
Wed Feb 21 21:59:23 2024: (VI_1) Sending/queueing gratuitous ARPs on net2 for 192.168.120.2
Wed Feb 21 21:59:23 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 21:59:23 2024: (VI_1) Sending/queueing gratuitous ARPs on net1 for 10.10.10.2
Wed Feb 21 21:59:23 2024: Sending gratuitous ARP on net1 for 10.10.10.2
Wed Feb 21 21:59:53 2024: (VI_1) Sending/queueing gratuitous ARPs on net2 for 192.168.120.2
Wed Feb 21 21:59:53 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 21:59:53 2024: (VI_1) Sending/queueing gratuitous ARPs on net1 for 10.10.10.2
Wed Feb 21 21:59:53 2024: Sending gratuitous ARP on net1 for 10.10.10.2
Wed Feb 21 22:00:23 2024: (VI_1) Sending/queueing gratuitous ARPs on net2 for 192.168.120.2
Wed Feb 21 22:00:23 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 22:00:23 2024: (VI_1) Sending/queueing gratuitous ARPs on net1 for 10.10.10.2
Wed Feb 21 22:00:23 2024: Sending gratuitous ARP on net1 for 10.10.10.2
Wed Feb 21 22:00:53 2024: (VI_1) Sending/queueing gratuitous ARPs on net2 for 192.168.120.2
Wed Feb 21 22:00:53 2024: Sending gratuitous ARP on net2 for 192.168.120.2
Wed Feb 21 22:00:53 2024: (VI_1) Sending/queueing gratuitous ARPs on net1 for 10.10.10.2
Wed Feb 21 22:00:53 2024: Sending gratuitous ARP on net1 for 10.10.10.2

$hostname0: tcpdump -i eth0 proto 112

21:57:23.063864 IP hostname0 > 10.128.2.123: VRRPv3, Advertisement, vrid 1, prio 200, intvl 40cs, length 16
21:57:23.280796 IP 10.128.2.123 > hostname0: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 50, intvl 40cs, length 16
21:57:23.281216 IP hostname0 > 10.128.2.123: VRRPv3, Advertisement, vrid 1, prio 200, intvl 40cs, length 16
21:57:23.681135 IP 10.128.2.123 > hostname0: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 50, intvl 40cs, length 16
21:57:23.681353 IP hostname0 > 10.128.2.123: VRRPv3, Advertisement, vrid 1, prio 200, intvl 40cs, length 16
21:57:24.081470 IP 10.128.2.123 > hostname0: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 50, intvl 40cs, length 16
21:57:24.081820 IP hostname0 > 10.128.2.123: VRRPv3, Advertisement, vrid 1, prio 200, intvl 40cs, length 16
21:57:24.481574 IP 10.128.2.123 > hostname0: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 50, intvl 40cs, length 16
...repeats..
22:01:43.852938 IP hostname0 > 10.128.2.123: VRRPv3, Advertisement, vrid 1, prio 200, intvl 40cs, length 16
22:01:44.252883 IP 10.128.2.123 > hostname0: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 50, intvl 40cs, length 16
22:01:44.253333 IP hostname0 > 10.128.2.123: VRRPv3, Advertisement, vrid 1, prio 200, intvl 40cs, length 16
22:01:44.653035 IP 10.128.2.123 > hostname0: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 50, intvl 40cs, length 16
22:01:44.653486 IP hostname0 > 10.128.2.123: VRRPv3, Advertisement, vrid 1, prio 200, intvl 40cs, length 16
22:01:45.053311 IP 10.128.2.123 > hostname0: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 50, intvl 40cs, length 16
22:01:45.053784 IP hostname0 > 10.128.2.123: VRRPv3, Advertisement, vrid 1, prio 200, intvl 40cs, length 16

$hostname1: tcpdump -i eth0 proto 112

21:57:23.065412 IP 10.131.0.178 > hostname1: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 200, intvl 40cs, length 16
21:57:23.279239 IP hostname1 > 10.131.0.178: VRRPv3, Advertisement, vrid 1, prio 50, intvl 40cs, length 16
21:57:23.282069 IP 10.131.0.178 > hostname1: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 200, intvl 40cs, length 16
21:57:23.680859 IP hostname1 > 10.131.0.178: VRRPv3, Advertisement, vrid 1, prio 50, intvl 40cs, length 16
21:57:23.681620 IP 10.131.0.178 > hostname1: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 200, intvl 40cs, length 16
21:57:24.081163 IP hostname1 > 10.131.0.178: VRRPv3, Advertisement, vrid 1, prio 50, intvl 40cs, length 16
21:57:24.082040 IP 10.131.0.178 > hostname1: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 200, intvl 40cs, length 16
21:57:24.481366 IP hostname1 > 10.131.0.178: VRRPv3, Advertisement, vrid 1, prio 50, intvl 40cs, length 16
...repeats...
22:02:03.465744 IP hostname1 > 10.131.0.178: VRRPv3, Advertisement, vrid 1, prio 50, intvl 40cs, length 16
22:02:03.466454 IP 10.131.0.178 > hostname1: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 200, intvl 40cs, length 16
22:02:03.865954 IP hostname1 > 10.131.0.178: VRRPv3, Advertisement, vrid 1, prio 50, intvl 40cs, length 16
22:02:03.866723 IP 10.131.0.178 > hostname1: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 200, intvl 40cs, length 16
22:02:04.266171 IP hostname1 > 10.131.0.178: VRRPv3, Advertisement, vrid 1, prio 50, intvl 40cs, length 16
22:02:04.267093 IP 10.131.0.178 > hostname1: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 200, intvl 40cs, length 16
22:02:04.666428 IP hostname1 > 10.131.0.178: VRRPv3, Advertisement, vrid 1, prio 50

Did keepalived coredump?

no

Additional context Add any other context about the problem here.

pqarmitage commented 9 months ago

Can you please provide the output of ip addr show eth0 on both systems.

If this doesn't help identify the cause of the problem I'll provide details of how to enable the various debug options within keepalived.

stanluk commented 8 months ago

Sorry for late answer, recently we had some problems with reproducing the issue. The below same issue with slightly different config then previously attached - like double initial master with same priority, which I know is anti-pattern, however the keepalived, as far as I tested, locally was able to recover from such wrong config. Moreover the number of reload was forced to be bigger then previous run.

The full logs from system run:

$hostname0: ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
3: eth0@if22956: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default 
    link/ether 0a:58:0a:81:02:64 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.129.2.100/23 brd 10.129.3.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fd01:0:0:3::598b/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::858:aff:fe81:264/64 scope link 
       valid_lft forever preferred_lft forever
4: net1@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 2a:90:6d:9a:a4:1b brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.168.1.36/24 brd 172.168.1.255 scope global net1
       valid_lft forever preferred_lft forever
    inet 10.10.10.2/24 scope global net1
       valid_lft forever preferred_lft forever
    inet6 fe80::2890:6dff:fe9a:a41b/64 scope link 
       valid_lft forever preferred_lft forever
5: net2@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 6e:21:6f:cf:da:91 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.168.0.4/24 scope global net2
       valid_lft forever preferred_lft forever
    inet 192.168.120.2/24 scope global net2
       valid_lft forever preferred_lft forever
    inet6 fe80::6c21:6fff:fecf:da91/64 scope link 
       valid_lft forever preferred_lft forever

$hostname0: ss
Netid State  Recv-Q Send-Q     Local Address:Port Peer Address:PortProcess
???   UNCONN 0      0           0.0.0.0%eth0:vrrp      0.0.0.0:*          
???   UNCONN 0      0      10.129.2.100%eth0:vrrp      0.0.0.0:*          

$hostname0: cat /tmp/keepalived.conf
global_defs {
  vrrp_startup_delay 10.0
  vrrp_garp_interval 0.001
  vrrp_version 3
  vrrp_garp_master_refresh 30
  vrrp_garp_lower_prio_repeat 5
  vrrp_higher_prio_send_advert true
  script_user root root
  notify_fifo /tmp/notify_fifo
  notify_fifo_script /tmp/notify.sh
}
vrrp_script check_masterability {
  script "/cmds -run check-master"
  interval 1
  timeout 1
  rise 1
  fall 1
}
vrrp_script check_masterability_on_active {
  script "/cmds -run check-master-on-active"
  interval 1
  timeout 1
  rise 2
  fall 3
}

track_file drop_master {
  file "/config/drop_master"
  weight 0
  init_file 0
}

vrrp_instance VI_1 {
  advert_int 0.4
  interface eth0
  state MASTER
  unicast_src_ip 10.129.2.100
  unicast_peer {
     10.131.0.83
  }
  virtual_router_id 1
  priority 255
  virtual_ipaddress {
    192.168.120.2/24 dev net2
    10.10.10.2/24 dev net1
  }
  virtual_routes {
  }
  track_script {
    check_masterability
    check_masterability_on_active
  }
  track_interface {
    net1
    net2
  }
  track_file {
    drop_master
  }
  notify_master "/cmds -run on-master"
}

$hostname0: tcpdump proto 112
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
07:31:50.435733 IP 10.131.0.83 > svc-tcp-service-01-0: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 255, intvl 40cs, length 16
07:31:50.531809 IP svc-tcp-service-01-0 > 10.131.0.83: VRRPv3, Advertisement, vrid 1, prio 255, intvl 40cs, length 16
07:31:50.835950 IP 10.131.0.83 > svc-tcp-service-01-0: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 255, intvl 40cs, length 16
07:31:50.931995 IP svc-tcp-service-01-0 > 10.131.0.83: VRRPv3, Advertisement, vrid 1, prio 255, intvl 40cs, length 16
07:31:51.236236 IP 10.131.0.83 > svc-tcp-service-01-0: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 255, intvl 40cs, length 
$hostname1: ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
3: eth0@if18234: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default 
    link/ether 0a:58:0a:83:00:53 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.131.0.83/23 brd 10.131.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fd01:0:0:5::4714/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::858:aff:fe83:53/64 scope link 
       valid_lft forever preferred_lft forever
4: net1@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether d6:cc:d7:68:3a:f3 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.168.1.48/24 brd 172.168.1.255 scope global net1
       valid_lft forever preferred_lft forever
    inet 10.10.10.2/24 scope global net1
       valid_lft forever preferred_lft forever
    inet6 fe80::d4cc:d7ff:fe68:3af3/64 scope link 
       valid_lft forever preferred_lft forever
5: net2@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 96:6c:04:e2:d5:28 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.168.0.3/24 scope global net2
       valid_lft forever preferred_lft forever
    inet 192.168.120.2/24 scope global net2
       valid_lft forever preferred_lft forever
    inet6 fe80::946c:4ff:fee2:d528/64 scope link 
       valid_lft forever preferred_lft forever

$hostname1: ss
Netid State  Recv-Q Send-Q    Local Address:Port Peer Address:PortProcess
???   UNCONN 0      0          0.0.0.0%eth0:vrrp      0.0.0.0:*          
???   UNCONN 0      0      10.131.0.83%eth0:vrrp      0.0.0.0:*          

$hostname1: cat /tmp/keepalived.conf
global_defs {
  vrrp_startup_delay 10.0
  vrrp_garp_interval 0.001
  vrrp_version 3
  vrrp_garp_master_refresh 30
  vrrp_garp_lower_prio_repeat 5
  vrrp_higher_prio_send_advert true
  script_user root root
  notify_fifo /tmp/notify_fifo
  notify_fifo_script /tmp/notify.sh
}
vrrp_script check_masterability {
  script "/cmds -run check-master"
  interval 1
  timeout 1
  rise 1
  fall 1
}
vrrp_script check_masterability_on_active {
  script "/cmds -run check-master-on-active"
  interval 1
  timeout 1
  rise 2
  fall 3
}

track_file drop_master {
  file "/config/drop_master"
  weight 0
  init_file 0
}

vrrp_instance VI_1 {
  advert_int 0.4
  interface eth0
  state MASTER
  unicast_src_ip 10.131.0.83
  unicast_peer {
     10.129.2.100
  }
  virtual_router_id 1
  priority 255
  virtual_ipaddress {
    192.168.120.2/24 dev net2
    10.10.10.2/24 dev net1
  }
  virtual_routes {
  }
  track_script {
    check_masterability
    check_masterability_on_active
  }
  track_interface {
    net1
    net2
  }
  track_file {
    drop_master
  }
  notify_master "/cmds -run on-master"
}

$hostname1: tcpdump proto 112
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
07:31:12.814505 IP svc-tcp-service-01-1 > 10.129.2.100: VRRPv3, Advertisement, vrid 1, prio 255, intvl 40cs, length 16
07:31:12.910542 IP 10.129.2.100 > svc-tcp-service-01-1: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 255, intvl 40cs, length 16
07:31:13.214688 IP svc-tcp-service-01-1 > 10.129.2.100: VRRPv3, Advertisement, vrid 1, prio 255, intvl 40cs, length 16
07:31:13.310866 IP 10.129.2.100 > svc-tcp-service-01-1: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 255, intvl 40cs, length 16
07:31:13.615045 IP svc-tcp-service-01-1 > 10.129.2.100: VRRPv3, Advertisement, vrid 1, prio 255, intvl 40cs, length 16
07:31:13.711201 IP 10.129.2.100 > svc-tcp-service-01-1: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 255, intvl 40cs, length 16
07:31:14.015266 IP svc-tcp-service-01-1 > 10.129.2.100: VRRPv3, Advertisement, vrid 1, prio 255, intvl 40cs, length 16

I even checked with strace and it seems that is processes:

strace: Process 89 attached
sendmsg(14, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.131.0.83")}, msg_namelen=16, msg_iov=[{iov_base="E\300\0$\17;\0\0\377p\0\0\n\201\2d\n\203\0S1\1\377\2\0(j\341\300\250x\2"..., iov_len=36}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 36
recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.131.0.83")}, msg_namelen=28 => 16, msg_iov=[{iov_base="E\300\0$\17<\0\0\376p\224\263\n\203\0S\n\201\2d1\1\377\2\0(j\341\300\250x\2"..., iov_len=1400}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_CTRUNC|MSG_TRUNC) = 36
recvmsg(13, {msg_namelen=16}, MSG_CTRUNC|MSG_TRUNC) = -1 EAGAIN (Resource temporarily unavailable)
sendmsg(14, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.131.0.83")}, msg_namelen=16, msg_iov=[{iov_base="E\300\0$\17<\0\0\377p\0\0\n\201\2d\n\203\0S1\1\377\2\0(j\341\300\250x\2"..., iov_len=36}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 36
recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.131.0.83")}, msg_namelen=28 => 16, msg_iov=[{iov_base="E\300\0$\17=\0\0\376p\224\262\n\203\0S\n\201\2d1\1\377\2\0(j\341\300\250x\2"..., iov_len=1400}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_CTRUNC|MSG_TRUNC) = 36

strace: Process 87 attached
recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.129.2.100")}, msg_namelen=28 => 16, msg_iov=[{iov_base="E\300\0$\16r\0\0\376p\225}\n\201\2d\n\203\0S1\1\377\2\0(j\341\300\250x\2"..., iov_len=1400}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_CTRUNC|MSG_TRUNC) = 36
recvmsg(13, {msg_namelen=16}, MSG_CTRUNC|MSG_TRUNC) = -1 EAGAIN (Resource temporarily unavailable)
sendmsg(14, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.129.2.100")}, msg_namelen=16, msg_iov=[{iov_base="E\300\0$\16s\0\0\377p\0\0\n\203\0S\n\201\2d1\1\377\2\0(j\341\300\250x\2"..., iov_len=36}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 36
recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.129.2.100")}, msg_namelen=28 => 16, msg_iov=[{iov_base="E\300\0$\16s\0\0\376p\225|\n\201\2d\n\203\0S1\1\377\2\0(j\341\300\250x\2"..., iov_len=1400}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_CTRUNC|MSG_TRUNC) = 36
recvmsg(13, {msg_namelen=16}, MSG_CTRUNC|MSG_TRUNC) = -1 EAGAIN (Resource temporarily unavailable)
sendmsg(14, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.129.2.100")}, msg_namelen=16, msg_iov=[{iov_base="E\300\0$\16t\0\0\377p\0\0\n\203\0S\n\201\2d1\1\377\2\0(j\341\300\250x\2"..., iov_len=36}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 36

The keepalived logs are available at: https://gist.github.com/stanluk/cc828b1f99a2f4734f609501eaa8c4ab

amoy-xuhao commented 6 months ago

Is there any progress on this issue? I am also encountering the same problem in my Kubernetes cluster

pqarmitage commented 5 months ago

I think this is probably caused by reloading keepalived before the _vrrp_startupdelay has expired. Looking in vrrp_dispatcher_read() in vrrp_scheduler.c, there are the following lines of code:

                if (vrrp_delayed_start_time.tv_sec)
                        continue;

which means that any packet received before the start delay timer expires is discarded. However when the restart occurs before the delay timer expires, the timer thread to cancel the timer is removed, and so the timer never expires.

I will continue investigating, and submit a patch later today.

pqarmitage commented 5 months ago

I was able to reproduce this problem, and it was indeed caused by reloading keepalived before the startup_delay timer had expired.

Commit 58483b2 resolves this issue. Many apologies for the long delay in resolving this, but I hadn't previously realised the significance of the startup delay.

stanluk commented 5 months ago

@pqarmitage thanks for investigating this and providing a patch!