Closed stanluk closed 5 months ago
Can you please provide the output of ip addr show eth0
on both systems.
If this doesn't help identify the cause of the problem I'll provide details of how to enable the various debug options within keepalived.
Sorry for late answer, recently we had some problems with reproducing the issue. The below same issue with slightly different config then previously attached - like double initial master with same priority, which I know is anti-pattern, however the keepalived, as far as I tested, locally was able to recover from such wrong config. Moreover the number of reload was forced to be bigger then previous run.
The full logs from system run:
$hostname0: ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
3: eth0@if22956: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default
link/ether 0a:58:0a:81:02:64 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.129.2.100/23 brd 10.129.3.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fd01:0:0:3::598b/64 scope global
valid_lft forever preferred_lft forever
inet6 fe80::858:aff:fe81:264/64 scope link
valid_lft forever preferred_lft forever
4: net1@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 2a:90:6d:9a:a4:1b brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.168.1.36/24 brd 172.168.1.255 scope global net1
valid_lft forever preferred_lft forever
inet 10.10.10.2/24 scope global net1
valid_lft forever preferred_lft forever
inet6 fe80::2890:6dff:fe9a:a41b/64 scope link
valid_lft forever preferred_lft forever
5: net2@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 6e:21:6f:cf:da:91 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.168.0.4/24 scope global net2
valid_lft forever preferred_lft forever
inet 192.168.120.2/24 scope global net2
valid_lft forever preferred_lft forever
inet6 fe80::6c21:6fff:fecf:da91/64 scope link
valid_lft forever preferred_lft forever
$hostname0: ss
Netid State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
??? UNCONN 0 0 0.0.0.0%eth0:vrrp 0.0.0.0:*
??? UNCONN 0 0 10.129.2.100%eth0:vrrp 0.0.0.0:*
$hostname0: cat /tmp/keepalived.conf
global_defs {
vrrp_startup_delay 10.0
vrrp_garp_interval 0.001
vrrp_version 3
vrrp_garp_master_refresh 30
vrrp_garp_lower_prio_repeat 5
vrrp_higher_prio_send_advert true
script_user root root
notify_fifo /tmp/notify_fifo
notify_fifo_script /tmp/notify.sh
}
vrrp_script check_masterability {
script "/cmds -run check-master"
interval 1
timeout 1
rise 1
fall 1
}
vrrp_script check_masterability_on_active {
script "/cmds -run check-master-on-active"
interval 1
timeout 1
rise 2
fall 3
}
track_file drop_master {
file "/config/drop_master"
weight 0
init_file 0
}
vrrp_instance VI_1 {
advert_int 0.4
interface eth0
state MASTER
unicast_src_ip 10.129.2.100
unicast_peer {
10.131.0.83
}
virtual_router_id 1
priority 255
virtual_ipaddress {
192.168.120.2/24 dev net2
10.10.10.2/24 dev net1
}
virtual_routes {
}
track_script {
check_masterability
check_masterability_on_active
}
track_interface {
net1
net2
}
track_file {
drop_master
}
notify_master "/cmds -run on-master"
}
$hostname0: tcpdump proto 112
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
07:31:50.435733 IP 10.131.0.83 > svc-tcp-service-01-0: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 255, intvl 40cs, length 16
07:31:50.531809 IP svc-tcp-service-01-0 > 10.131.0.83: VRRPv3, Advertisement, vrid 1, prio 255, intvl 40cs, length 16
07:31:50.835950 IP 10.131.0.83 > svc-tcp-service-01-0: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 255, intvl 40cs, length 16
07:31:50.931995 IP svc-tcp-service-01-0 > 10.131.0.83: VRRPv3, Advertisement, vrid 1, prio 255, intvl 40cs, length 16
07:31:51.236236 IP 10.131.0.83 > svc-tcp-service-01-0: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 255, intvl 40cs, length
$hostname1: ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
3: eth0@if18234: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default
link/ether 0a:58:0a:83:00:53 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.131.0.83/23 brd 10.131.1.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fd01:0:0:5::4714/64 scope global
valid_lft forever preferred_lft forever
inet6 fe80::858:aff:fe83:53/64 scope link
valid_lft forever preferred_lft forever
4: net1@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether d6:cc:d7:68:3a:f3 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.168.1.48/24 brd 172.168.1.255 scope global net1
valid_lft forever preferred_lft forever
inet 10.10.10.2/24 scope global net1
valid_lft forever preferred_lft forever
inet6 fe80::d4cc:d7ff:fe68:3af3/64 scope link
valid_lft forever preferred_lft forever
5: net2@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 96:6c:04:e2:d5:28 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.168.0.3/24 scope global net2
valid_lft forever preferred_lft forever
inet 192.168.120.2/24 scope global net2
valid_lft forever preferred_lft forever
inet6 fe80::946c:4ff:fee2:d528/64 scope link
valid_lft forever preferred_lft forever
$hostname1: ss
Netid State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
??? UNCONN 0 0 0.0.0.0%eth0:vrrp 0.0.0.0:*
??? UNCONN 0 0 10.131.0.83%eth0:vrrp 0.0.0.0:*
$hostname1: cat /tmp/keepalived.conf
global_defs {
vrrp_startup_delay 10.0
vrrp_garp_interval 0.001
vrrp_version 3
vrrp_garp_master_refresh 30
vrrp_garp_lower_prio_repeat 5
vrrp_higher_prio_send_advert true
script_user root root
notify_fifo /tmp/notify_fifo
notify_fifo_script /tmp/notify.sh
}
vrrp_script check_masterability {
script "/cmds -run check-master"
interval 1
timeout 1
rise 1
fall 1
}
vrrp_script check_masterability_on_active {
script "/cmds -run check-master-on-active"
interval 1
timeout 1
rise 2
fall 3
}
track_file drop_master {
file "/config/drop_master"
weight 0
init_file 0
}
vrrp_instance VI_1 {
advert_int 0.4
interface eth0
state MASTER
unicast_src_ip 10.131.0.83
unicast_peer {
10.129.2.100
}
virtual_router_id 1
priority 255
virtual_ipaddress {
192.168.120.2/24 dev net2
10.10.10.2/24 dev net1
}
virtual_routes {
}
track_script {
check_masterability
check_masterability_on_active
}
track_interface {
net1
net2
}
track_file {
drop_master
}
notify_master "/cmds -run on-master"
}
$hostname1: tcpdump proto 112
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
07:31:12.814505 IP svc-tcp-service-01-1 > 10.129.2.100: VRRPv3, Advertisement, vrid 1, prio 255, intvl 40cs, length 16
07:31:12.910542 IP 10.129.2.100 > svc-tcp-service-01-1: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 255, intvl 40cs, length 16
07:31:13.214688 IP svc-tcp-service-01-1 > 10.129.2.100: VRRPv3, Advertisement, vrid 1, prio 255, intvl 40cs, length 16
07:31:13.310866 IP 10.129.2.100 > svc-tcp-service-01-1: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 255, intvl 40cs, length 16
07:31:13.615045 IP svc-tcp-service-01-1 > 10.129.2.100: VRRPv3, Advertisement, vrid 1, prio 255, intvl 40cs, length 16
07:31:13.711201 IP 10.129.2.100 > svc-tcp-service-01-1: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 255, intvl 40cs, length 16
07:31:14.015266 IP svc-tcp-service-01-1 > 10.129.2.100: VRRPv3, Advertisement, vrid 1, prio 255, intvl 40cs, length 16
I even checked with strace and it seems that is processes:
strace: Process 89 attached
sendmsg(14, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.131.0.83")}, msg_namelen=16, msg_iov=[{iov_base="E\300\0$\17;\0\0\377p\0\0\n\201\2d\n\203\0S1\1\377\2\0(j\341\300\250x\2"..., iov_len=36}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 36
recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.131.0.83")}, msg_namelen=28 => 16, msg_iov=[{iov_base="E\300\0$\17<\0\0\376p\224\263\n\203\0S\n\201\2d1\1\377\2\0(j\341\300\250x\2"..., iov_len=1400}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_CTRUNC|MSG_TRUNC) = 36
recvmsg(13, {msg_namelen=16}, MSG_CTRUNC|MSG_TRUNC) = -1 EAGAIN (Resource temporarily unavailable)
sendmsg(14, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.131.0.83")}, msg_namelen=16, msg_iov=[{iov_base="E\300\0$\17<\0\0\377p\0\0\n\201\2d\n\203\0S1\1\377\2\0(j\341\300\250x\2"..., iov_len=36}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 36
recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.131.0.83")}, msg_namelen=28 => 16, msg_iov=[{iov_base="E\300\0$\17=\0\0\376p\224\262\n\203\0S\n\201\2d1\1\377\2\0(j\341\300\250x\2"..., iov_len=1400}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_CTRUNC|MSG_TRUNC) = 36
strace: Process 87 attached
recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.129.2.100")}, msg_namelen=28 => 16, msg_iov=[{iov_base="E\300\0$\16r\0\0\376p\225}\n\201\2d\n\203\0S1\1\377\2\0(j\341\300\250x\2"..., iov_len=1400}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_CTRUNC|MSG_TRUNC) = 36
recvmsg(13, {msg_namelen=16}, MSG_CTRUNC|MSG_TRUNC) = -1 EAGAIN (Resource temporarily unavailable)
sendmsg(14, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.129.2.100")}, msg_namelen=16, msg_iov=[{iov_base="E\300\0$\16s\0\0\377p\0\0\n\203\0S\n\201\2d1\1\377\2\0(j\341\300\250x\2"..., iov_len=36}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 36
recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.129.2.100")}, msg_namelen=28 => 16, msg_iov=[{iov_base="E\300\0$\16s\0\0\376p\225|\n\201\2d\n\203\0S1\1\377\2\0(j\341\300\250x\2"..., iov_len=1400}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_CTRUNC|MSG_TRUNC) = 36
recvmsg(13, {msg_namelen=16}, MSG_CTRUNC|MSG_TRUNC) = -1 EAGAIN (Resource temporarily unavailable)
sendmsg(14, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.129.2.100")}, msg_namelen=16, msg_iov=[{iov_base="E\300\0$\16t\0\0\377p\0\0\n\203\0S\n\201\2d1\1\377\2\0(j\341\300\250x\2"..., iov_len=36}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 36
The keepalived logs are available at: https://gist.github.com/stanluk/cc828b1f99a2f4734f609501eaa8c4ab
Is there any progress on this issue? I am also encountering the same problem in my Kubernetes cluster
I think this is probably caused by reloading keepalived before the _vrrp_startupdelay has expired. Looking in vrrp_dispatcher_read() in vrrp_scheduler.c, there are the following lines of code:
if (vrrp_delayed_start_time.tv_sec)
continue;
which means that any packet received before the start delay timer expires is discarded. However when the restart occurs before the delay timer expires, the timer thread to cancel the timer is removed, and so the timer never expires.
I will continue investigating, and submit a patch later today.
I was able to reproduce this problem, and it was indeed caused by reloading keepalived before the startup_delay timer had expired.
Commit 58483b2 resolves this issue. Many apologies for the long delay in resolving this, but I hadn't previously realised the significance of the startup delay.
@pqarmitage thanks for investigating this and providing a patch!
Describe the bug Rarely we are hitting issue on our setup cluster, when two keepalived instances cannot recover from split brain. The first guess was that network setup do not work correctly, however tcpdump shows the vrrp packets are sent/recieve on both machines. Even though one keepalived doesn't transition from MASTER to BACKUP.
To Reproduce The issue is mostly reproducible when we combine two factors that may happen on our setup:
The typical reproduction scenario contains:
Any ideas how to further debug this issue will be appreciated.
Expected behavior Lower priority instance should transition to backup
Keepalived version
Details of any containerisation or hosted service (e.g. AWS) Self-hosted k8s.
Configuration file:
Notify and track scripts
System Log entries
$hostname0:
$hostname1
$hostname0: tcpdump -i eth0 proto 112
$hostname1: tcpdump -i eth0 proto 112
Did keepalived coredump?
Additional context Add any other context about the problem here.