Closed rsparulek closed 4 years ago
@pqarmitage Any pointers on this issue will be great. We are hitting this issue consistently on our keepalived setup with 3 masters causing a brief outage after the second failover happens (inspite of we setting noprempt option in our vrrp config)
I have tested this with the latest version of keepalived (v2.0.20 + all subsequent commits) and I do not observe the problem you have reported.
There is one problem that I observe and that is that the two "backup" systems both have the same VRRP priority. This means that when the high priority instance goes down, both "backup" systems take over as master, and then one of them (the one with the lower IP address) has to fallback to backup. So the first thing I would suggest is reducing the priority of one of the systems to be less than 100, so that all three have different VRRP priorities.
Since the version of keepalived you are using is rather old, I cannot remember if I have fixed any issue with preempt in the last 18 months. It would be worth you upgrading to v2.0.20 to see if that resolves your problem. However, I have also tested this with v2.0.7 and cannot reproduce your problem.
On each of your 3 systems, when keepalived is running could you please send SIGUSR1 to the parent keepalived process and post the resulting /etc/keepalived.data from each system here.
@pqarmitage Thanks for your inputs. Could you share the VRRP configuration you are using ? Also, do we have the rpm available for the keepalived 2.0.20 version; I could find rpms only for 2.0.7 online; could you point me to the 2.0.20 rpm ?
My current rpm:
[root@mynode ~]# rpm -qa | grep keepalived
keepalived-2.0.7-1.el7.x86_64
The VRRP configuration I was using was the one you provided above, with two of the 3 systems have the VRRP priority changed to 100.
I don't have an RPM available; you will need to build keepalived from the source. You could download the Fedora source rpm for v2.0.20 and use that to build your own rpm.
@pqarmitage When you said you tried to reproduce on 2.0.7 on a 3 node setup; did your main master node with the highest priority also have the highest IP address of all the 3 nodes. In my setup; my main master with the highest priority also has the highest IP address; I am assuming in my case; keepalived breaks the tie using the highest IP address and hence causes a switchover of the VIP back to the highest priority master when it comes back up.
My understanding is that you have one vrrp instance with priority 101 and the other two instances with priority 100. If that is the situation, then the IP address of the instance with priority 101 is not important.
The only time when which IP address is higher is relevant is when two (or more) instances have become master and they all have the same priority. The IP address is then used as a tie break to determine which of the instances should back off and revert to backup; any instance which sees an advert with a higher source IP address reverts to backup.
@pqarmitage makes sense. Did you get a chance to take a look at the VRRP logs I posted above; I always see a "forcing new election" log line in the logs of the highest priority master which snatches the VIP once it comes back up even when we have a noprempt option. Is that log line an expected behaviour with noprempt option enabled?
Mar 3 15:07:53 cscale-82-163 Keepalived_vrrp[2370]: (VI_1) ip address associated with VRID 97 not present in MASTER advert : 10.9.93.101
Mar 3 15:07:54 cscale-82-163 Keepalived_vrrp[2370]: (VI_1) Receive advertisement timeout
Mar 3 15:07:54 cscale-82-163 Keepalived_vrrp[2370]: (VI_1) Entering MASTER STATE
Mar 3 15:07:54 cscale-82-163 Keepalived_vrrp[2370]: (VI_1) setting VIPs.
Mar 3 15:07:54 cscale-82-163 Keepalived_vrrp[2370]: Sending gratuitous ARP on br0 for 10.9.93.101
Mar 3 15:07:54 cscale-82-163 Keepalived_vrrp[2370]: (VI_1) Sending/queueing gratuitous ARPs on br0 for 10.9.93.101
Mar 3 15:07:54 cscale-82-163 Keepalived_vrrp[2370]: Sending gratuitous ARP on br0 for 10.9.93.101
Mar 3 15:07:54 cscale-82-163 Keepalived_vrrp[2370]: Sending gratuitous ARP on br0 for 10.9.93.101
Mar 3 15:07:54 cscale-82-163 Keepalived_vrrp[2370]: Sending gratuitous ARP on br0 for 10.9.93.101
Mar 3 15:07:54 cscale-82-163 Keepalived_vrrp[2370]: Sending gratuitous ARP on br0 for 10.9.93.101
Mar 3 15:07:54 cscale-82-163 Keepalived_vrrp[2370]: (VI_1) Received advert from 10.9.82.139 with lower priority 101, ours 101, forcing new election
Mar 3 15:07:54 cscale-82-163 Keepalived_vrrp[2370]: Sending gratuitous ARP on br0 for 10.9.93.101
Mar 3 15:07:54 cscale-82-163 Keepalived_vrrp[2370]: (VI_1) Sending/queueing gratuitous ARPs on br0 for 10.9.93.101
Mar 3 15:07:54 cscale-82-163 Keepalived_vrrp[2370]: Sending gratuitous ARP on br0 for 10.9.93.101
Mar 3 15:07:54 cscale-82-163 Keepalived_vrrp[2370]: Sending gratuitous ARP on br0 for 10.9.93.101
Mar 3 15:07:54 cscale-82-163 Keepalived_vrrp[2370]: Sending gratuitous ARP on br0 for 10.9.93.101
Mar 3 15:07:54 cscale-82-163 Keepalived_vrrp[2370]: Sending gratuitous ARP on br0 for 10.9.93.101
Mar 3 15:07:54 cscale-82-163 Keepalived_vrrp[2370]: (VI_1) Received advert from 10.9.82.140 with lower priority 101, ours 101, forcing new election
Mar 3 15:07:54 cscale-82-163 Keepalived_vrrp[2370]: Sending gratuitous ARP on br0 for 10.9.93.101
Mar 3 15:07:54 cscale-82-163 Keepalived_vrrp[2370]: (VI_1) Sending/queueing gratuitous ARPs on br0 for 10.9.93.101
Ignore the priority numbers as these logs are older when i was using equal priorities for all masters but I see this log line even when i changed priorities as you suggested; why are we forcing a new election every time this node is coming up. New logs posted below with different priorities:
Mar 4 14:47:41 cscale-82-163 Keepalived_vrrp[2404]: (VI_1) Received advert from 10.9.82.140 with lower priority 100, ours 101, forcing new election
Mar 4 14:47:41 cscale-82-163 Keepalived_vrrp[2404]: Sending gratuitous ARP on br0 for 10.9.93.101
Mar 4 14:47:41 cscale-82-163 Keepalived_vrrp[2404]: (VI_1) Sending/queueing gratuitous ARPs on br0 for 10.9.93.101
Mar 4 14:47:41 cscale-82-163 Keepalived_vrrp[2404]: Sending gratuitous ARP on br0 for 10.9.93.101
Mar 4 14:47:41 cscale-82-163 Keepalived_vrrp[2404]: Sending gratuitous ARP on br0 for 10.9.93.101
Mar 4 14:47:41 cscale-82-163 Keepalived_vrrp[2404]: Sending gratuitous ARP on br0 for 10.9.93.101
Mar 4 14:47:41 cscale-82-163 Keepalived_vrrp[2404]: Sending gratuitous ARP on br0 for 10.9.93.101
Mar 4 14:47:42 cscale-82-163 Keepalived_vrrp[2404]: (VI_1) ip address associated with VRID 97 not present in MASTER advert : 10.9.93.101
Mar 4 14:47:42 cscale-82-163 systemd: Started Session c2612 of user root.
Mar 4 14:47:43 cscale-82-163 Keepalived_vrrp[2404]: Sending gratuitous ARP on br0 for 10.9.93.101
Mar 4 14:47:43 cscale-82-163 Keepalived_vrrp[2404]: (VI_1) Sending/queueing gratuitous ARPs on br0 for 10.9.93.101
Mar 4 14:47:43 cscale-82-163 Keepalived_vrrp[2404]: Sending gratuitous ARP on br0 for 10.9.93.101
Mar 4 14:47:43 cscale-82-163 Keepalived_vrrp[2404]: Sending gratuitous ARP on br0 for 10.9.93.101
Mar 4 14:47:43 cscale-82-163 Keepalived_vrrp[2404]: Sending gratuitous ARP on br0 for 10.9.93.101
Mar 4 14:47:43 cscale-82-163 Keepalived_vrrp[2404]: Sending gratuitous ARP on br0 for 10.9.93.101
Mar 4 14:47:43 cscale-82-163 Keepalived_vrrp[2404]: (VI_1) ip address associated with VRID 97 not present in MASTER advert : 10.9.93.101
Mar 4 14:47:44 cscale-82-163 Keepalived_vrrp[2404]: (VI_1) ip address associated with VRID 97 not present in MASTER advert : 10.9.93.101
Mar 4 14:47:44 cscale-82-163 systemd: Started Session c2613 of user root.
Mar 4 14:47:45 cscale-82-163 Keepalived_vrrp[2404]: (VI_1) ip address associated with VRID 97 not present in MASTER advert : 10.9.93.101
Mar 4 14:47:46 cscale-82-163 Keepalived_vrrp[2404]: (VI_1) ip address associated with VRID 97 not present in MASTER advert : 10.9.93.101
Mar 4 14:47:46 cscale-82-163 systemd: Started Session c2614 of user root.
Mar 4 14:47:47 cscale-82-163 Keepalived_vrrp[2404]: (VI_1) Received advert from 10.9.82.140 with lower priority 100, ours 101, forcing new election
Mar 4 14:47:47 cscale-82-163 Keepalived_vrrp[2404]: Sending gratuitous ARP on br0 for 10.9.93.101
Mar 4 14:47:47 cscale-82-163 Keepalived_vrrp[2404]: (VI_1) Sending/queueing gratuitous ARPs on br0 for 10.9.93.101
Mar 4 14:47:47 cscale-82-163 Keepalived_vrrp[2404]: Sending gratuitous ARP on br0 for 10.9.93.101
The interesting log line is:
Mar 3 15:07:53 cscale-82-163 Keepalived_vrrp[2370]: (VI_1) ip address associated with VRID 97 not present in MASTER advert : 10.9.93.101
This means that your configurations do not match on your three systems, in particular you do not have 10.9.93.101 listed in the virtual_ipaddress
block of the lower priority system that becomes master. As a consequence, when the system with priority 101 receives the adverts from a lower priority instance, it discards the advert as invalid. It then times out and becomes master.
If you change the virtual_ipaddress
so that they are identical in all the configurations, then it should work as you want.
@pqarmitage I have exact same VIPs on all 3 nodes:
[root@cscale-82-163 ~]# cat /etc/keepalived/keepalived.conf
global_defs {
script_user root root
enable_script_security
vrrp_garp_master_delay 1
vrrp_garp_master_refresh 60
}
vrrp_script chk_haproxy {
script "sudo /usr/bin/killall -0 haproxy"
interval 2
fall 2
rise 2
}
vrrp_script chk_etcd {
script "sudo /usr/local/bin/retcd get-leader"
interval 2
init_fail
}
vrrp_instance VI_1 {
interface br0
state BACKUP
advert_int 1
nopreempt
virtual_router_id 97
priority 101
virtual_ipaddress {
10.9.93.101/16 dev br0
}
track_script {
chk_haproxy
#chk_etcd
}
}
[root@cscale-82-139 ~]# cat /etc/keepalived/keepalived.conf
global_defs {
script_user root root
enable_script_security
vrrp_garp_master_delay 1
vrrp_garp_master_refresh 60
}
vrrp_script chk_haproxy {
script "sudo /usr/bin/killall -0 haproxy"
interval 2
fall 2
rise 2
}
vrrp_script chk_etcd {
script "sudo /usr/local/bin/retcd get-leader"
interval 2
init_fail
}
vrrp_instance VI_1 {
interface br0
state BACKUP
advert_int 1
nopreempt
virtual_router_id 97
priority 95
virtual_ipaddress {
10.9.93.101/16 dev br0
}
track_script {
chk_haproxy
#chk_etcd
}
}
[root@cscale-82-140 ~]# cat /etc/keepalived/keepalived.conf
global_defs {
script_user root root
enable_script_security
vrrp_garp_master_delay 1
vrrp_garp_master_refresh 60
}
vrrp_script chk_haproxy {
script "sudo /usr/bin/killall -0 haproxy"
interval 2
fall 2
rise 2
}
vrrp_script chk_etcd {
script "sudo /usr/local/bin/retcd get-leader"
interval 2
init_fail
}
vrrp_instance VI_1 {
interface br0
state BACKUP
advert_int 1
nopreempt
virtual_router_id 97
priority 95
virtual_ipaddress {
10.9.93.101/16 dev br0
}
track_script {
chk_haproxy
#chk_etcd
}
}
@rsparulek As I wrote above, the problem is indicated by the log entries which say:
Mar 3 15:07:53 cscale-82-163 Keepalived_vrrp[2370]: (VI_1) ip address associated with VRID 97 not present in MASTER advert : 10.9.93.101
This means that the adverts received on cscale-82-163 do not contain the address 10.9.93.101. As a consequence of this the adverts are discarded by keepalived on cscale-82-163 and since it sees no adverts other than the ones it discards, it times out and becomes master.
In order to progress this any further, I need to see the following:
Send SIGUSR1 to the parent keepalived process on each system. This will create the file /tmp/keepalived.data. Please attach each of the /tmp/keepalived.data files to this issue.
On cscale-82-163 when you are getting the ip address associated with VRID 97 not present in MASTER advert
messages, run tcpdump -n -nn -v -i br0 proto 112
and post the output of that here too.
Once we have these, we should be able to see what is happening.
capture.txt keepalived.data.139.txt keepalived.data.140.txt keepalived.data.163.txt
@pqarmitage Uploaded the logs and data you requested hereby on my cluster nodes. Let me know if anything else is needed from my end.
Many Thanks! Your help is greatly appreciated.
@rsparulek You STILL have cscale-82-139 and cscale-82-140 having the same VRRP priority, i.e. 95. This will not work properly since when keepalived on cscale-82-163 stops, both cscale-82-139 and cscale-82-140 will become master at the same time, and then one will subsequently have to revert to backup. There is not much point in continuing with this issue until the VRRP instances on all 3 systems have DIFFERENT priorities.
You seem to have a large amount of VRRP traffic, as follows:
VRID Source IP Priority Virtual IP addresses
vrid 106 10.9.20.13 prio 100 10.9.108.106
vrid 107 10.9.82.70 prio 101 10.9.108.107
vrid 11 10.9.82.64 prio 101 10.9.111.124
vrid 113 10.9.140.110 prio 100 10.9.86.90
vrid 12 10.9.82.61 prio 101 10.9.111.125
vrid 150 10.9.60.239 prio 101 10.9.119.150
vrid 151 10.9.50.103 prio 101 10.9.119.151
vrid 152 10.9.60.208 prio 101 10.9.119.152
vrid 159 10.9.40.215 prio 101 10.9.119.159
vrid 164 10.9.82.150 prio 101 10.9.108.175
vrid 166 10.9.109.74 prio 101 10.9.109.75
vrid 169 10.9.60.226 prio 101 10.9.114.34
vrid 171 10.9.82.49 prio 101 10.9.97.100
vrid 200 10.9.60.72 prio 101 10.9.99.210
vrid 21 10.9.82.67 prio 100 10.9.98.52
vrid 250 10.9.100.206 prio 101 10.9.100.250
vrid 253 10.9.140.101 prio 101 10.9.82.253
vrid 40 10.9.109.71 prio 101 10.9.109.72
vrid 41 10.9.109.81 prio 101 10.9.109.84
vrid 43 10.9.100.103 prio 101 10.9.117.2
vrid 54 10.9.60.244 prio 100 10.9.114.34
vrid 59 10.9.60.71 prio 101 10.9.99.200
vrid 64 10.9.121.21 prio 101 10.9.121.24
vrid 77 10.9.40.105 prio 101 10.9.232.182
vrid 88 10.9.82.143 prio 101 10.9.82.250
vrid 93 10.9.120.108 prio 101 10.9.94.3
vrid 94 10.9.60.31 prio 101 10.9.118.254
vrid 97 10.9.82.139 prio 95 10.9.93.101
vrid 97 10.9.82.140 prio 95 10.9.93.101
vrid 97 10.9.82.163 prio 101 10.9.93.101
vrid 97 10.9.82.81 prio 101 10.9.115.1
vrid 97 10.9.82.82 prio 100 10.9.115.1
vrid 97 10.9.82.83 prio 100 10.9.115.1
and this identifies the main cause of your problem, which is that you have another set of systems (with IP adddresses 10.9.82.81, 10.9.82.82, and 10.9.82.83) also using VRID 97, but with virtual ipaddress 10.9.115.1. For some reason all three of these systems are in MASTER state, and whether that is caused by 10.9.82.140/139/163 also transmitting adverts with VRID 97 but with a different virtual ipaddress I cannot say.
Once you sort out the duplicate VRID problem, and also change it so that for each VRID each system has a different priority, then I expect your problem will be resolved.
@pqarmitage Many thanks for your inputs and debugging! I don't see this issue when I have unique priorities across 3 masters. I am still using keepalived 2.0.7 version. I will update this bug if I see any issue again.
@pqarmitage This issue is resolved. Closing for now. Thanks for all your help! I will re-open if I hit any other issues.
@pqarmitage I am re-hitting the VIP switchover issue even when I have 3 different priorities on my 3 masters as seen below:
Main master:
# cat /etc/keepalived/keepalived.conf
global_defs {
script_user root root
enable_script_security
vrrp_garp_master_delay 1
vrrp_garp_master_refresh 60
}
vrrp_script chk_haproxy {
script "sudo /usr/bin/killall -0 haproxy"
interval 2
fall 2
rise 2
}
vrrp_script chk_etcd {
script "sudo /usr/local/bin/retcd get-leader"
interval 2
init_fail
}
vrrp_instance VI_1 {
interface br0
state BACKUP
advert_int 1
nopreempt
virtual_router_id 43
priority 101
virtual_ipaddress {
10.9.117.2/16 dev br0
}
track_script {
chk_haproxy
#chk_etcd
}
}
Master 2:
global_defs {
script_user root root
enable_script_security
vrrp_garp_master_delay 1
vrrp_garp_master_refresh 60
}
vrrp_script chk_haproxy {
script "sudo /usr/bin/killall -0 haproxy"
interval 2
fall 2
rise 2
}
vrrp_script chk_etcd {
script "sudo /usr/local/bin/retcd get-leader"
interval 2
init_fail
}
vrrp_instance VI_1 {
interface br0
state BACKUP
advert_int 1
nopreempt
virtual_router_id 43
priority 95
virtual_ipaddress {
10.9.117.2/16 dev br0
}
track_script {
chk_haproxy
#chk_etcd
}
}
Master 3:
global_defs {
script_user root root
enable_script_security
vrrp_garp_master_delay 1
vrrp_garp_master_refresh 60
}
vrrp_script chk_haproxy {
script "sudo /usr/bin/killall -0 haproxy"
interval 2
fall 2
rise 2
}
vrrp_script chk_etcd {
script "sudo /usr/local/bin/retcd get-leader"
interval 2
init_fail
}
vrrp_instance VI_1 {
interface br0
state BACKUP
advert_int 1
nopreempt
virtual_router_id 43
priority 90
virtual_ipaddress {
10.9.117.2/16 dev br0
}
track_script {
chk_haproxy
#chk_etcd
}
}
From keepalived logs on secondary master; I see that when the master 1 comes up; the secondary master gives up it's VIP as seen from logs below on the secondary master:
Mar 27 15:25:04 eqx04-flash06 Keepalived_vrrp[25180]: (VI_1) Sending/queueing gratuitous ARPs on br0 for 10.9.117.2
Mar 27 15:25:05 eqx04-flash06 systemd: Started Session c39586 of user root.
Mar 27 15:25:07 eqx04-flash06 systemd: Started Session c39587 of user root.
Mar 27 15:25:08 eqx04-flash06 Keepalived_vrrp[25180]: (VI_1) Master received advert from 10.9.100.103 with higher priority 101, ours 95
Mar 27 15:25:08 eqx04-flash06 Keepalived_vrrp[25180]: (VI_1) Entering BACKUP STATE
Mar 27 15:25:08 eqx04-flash06 Keepalived_vrrp[25180]: (VI_1) removing VIPs.
On the main master; I see that the forceful VIP re-acquisition happens:
Mar 27 15:25:05 eqx01-flash03 systemd: Starting DNS caching server....
Mar 27 15:25:05 eqx01-flash03 Keepalived_vrrp[2158]: (VI_1) Entering BACKUP STATE
Mar 27 15:25:09 eqx01-flash03 Keepalived_vrrp[2158]: (VI_1) Receive advertisement timeout
Mar 27 15:25:09 eqx01-flash03 Keepalived_vrrp[2158]: (VI_1) Entering MASTER STATE
Mar 27 15:25:09 eqx01-flash03 Keepalived_vrrp[2158]: (VI_1) setting VIPs.
Mar 27 15:25:09 eqx01-flash03 systemd: Starting Wait for Plymouth Boot Screen to Quit...
Mar 27 15:25:09 eqx01-flash03 Keepalived_vrrp[2158]: Sending gratuitous ARP on br0 for 10.9.117.2
Mar 27 15:25:09 eqx01-flash03 Keepalived_vrrp[2158]: (VI_1) Sending/queueing gratuitous ARPs on br0 for 10.9.117.2
Mar 27 15:25:09 eqx01-flash03 Keepalived_vrrp[2158]: Sending gratuitous ARP on br0 for 10.9.117.2
Mar 27 15:25:09 eqx01-flash03 Keepalived_vrrp[2158]: Sending gratuitous ARP on br0 for 10.9.117.2
Mar 27 15:25:09 eqx01-flash03 Keepalived_vrrp[2158]: Sending gratuitous ARP on br0 for 10.9.117.2
Mar 27 15:25:09 eqx01-flash03 Keepalived_vrrp[2158]: Sending gratuitous ARP on br0 for 10.9.117.2
I see from above logs that the main master did enter BACKUP state to begin with but later re-entered MASTER state; could you help me with figuring out this issue?
@pqarmitage Any pointers on this issue?
We have seen the issue before where when networking starts up packets start being passed, and then no packets are passed for a while, and then they start being passed again, and this is what appears to be happening here.
You need to delay keepalived starting up until the network is fully up and settled.
There is a keepalived option vrrp_startup_delay
for precisely this problem. It delays the startup of the vrrp process to allow time for the networking to settle down.
I am closing this problem now since there has been no response for over a week.
Describe the bug Keepalived noprempt option does not function as expected and fails over to main master after master comes up. I have 3 master setup and keepalived running and all 3 masters have state as BACKUP and main master has priority as 101 and other 2 masters have priority 100. I also use the noprempt option. We see that when the VIP is present on the main master and main master goes down; vip moves to master 2; but when the main master 1 comes up again; VIP moves again to the main master 1 even though we have noprempt option configured in the VRRP config as pasted below.
To Reproduce Any steps necessary to reproduce the behaviour:
Expected behavior We should not switch over to the main master 1 when it comes back up.
Keepalived version
Copyright(C) 2001-2018 Alexandre Cassen, acassen@gmail.com
Distro (please complete the following information):
Details of any containerisation or hosted service (e.g. AWS) Keepalived runs as a systemctl service on the host which has Kubernetes 1.16.3 installed
Configuration file:
Notify and track scripts If any notify or track scripts are in use, please provide copies of them
System Log entries
/var/log/messages when the main master comes up and preempts the existing master and the VIP switches to the main master
Did keepalived coredump? If so, can you please provide a stacktrace from the coredump, using gdb.
Additional context Add any other context about the problem here.