Why so many elections to become MASTER?

mikaku commented 4 years ago

Describe the issue In fact there are two issues here:

Apparently a server that switched to BACKUP was sending adverts as if it was MASTER to another server. Why?
Why there are so many elections forced (up to 8) to become MASTER when there are only two servers involved with a completely different priority values.

In short the servers are as follows (full configuration below):

mdb1 (the preferred MASTER, with nopreempt) and with a priority of 150. mdb2 (BACKUP) with priority 50. mdb3 (BACKUP) with priority 49.

This is what happened:

At 09:16:47 the mdb3 server switched to MASTER as it didn't receive any advert from mdb1 (the default MASTER server).

Then almost immediately, server mdb2 also switched to MASTER by the same reason and, still in the same second, the server mdb3 received an advert from the new MASTER server mdb2 who forced elections and won due to its higher priority. So, server mdb3 switched back to BACKUP.

The following are the logs in servers mdb3 and mdb2 that explains all this:

Full log from server mdb3 (IPs substituted by <names>):

Dec 16 09:16:47 mdb3 Keepalived_vrrp[1171]: (VI_1) Receive advertisement timeout
Dec 16 09:16:47 mdb3 Keepalived_vrrp[1171]: (VI_1) Entering MASTER STATE
Dec 16 09:16:47 mdb3 Keepalived_vrrp[1171]: (VI_1) setting VIPs.
Dec 16 09:16:47 mdb3 Keepalived_vrrp[1171]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:47 mdb3 Keepalived_vrrp[1171]: (VI_1) Sending/queueing gratuitous ARPs on eth0 for <VIP>
Dec 16 09:16:47 mdb3 Keepalived_vrrp[1171]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:47 mdb3 Keepalived_vrrp[1171]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:47 mdb3 Keepalived_vrrp[1171]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:47 mdb3 Keepalived_vrrp[1171]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:47 mdb3 Keepalived_vrrp[1171]: (VI_1) Master received advert from <mdb2> with higher priority 50, ours 49
Dec 16 09:16:47 mdb3 Keepalived_vrrp[1171]: (VI_1) Entering BACKUP STATE
Dec 16 09:16:47 mdb3 Keepalived_vrrp[1171]: (VI_1) removing VIPs.

Full log from server mdb2 (IPs substituted by <names>):

Dec 16 09:16:47 mdb2 Keepalived_vrrp[1168]: (VI_1) Receive advertisement timeout
Dec 16 09:16:47 mdb2 Keepalived_vrrp[1168]: (VI_1) Entering MASTER STATE
Dec 16 09:16:47 mdb2 Keepalived_vrrp[1168]: (VI_1) setting VIPs.
Dec 16 09:16:47 mdb2 Keepalived_vrrp[1168]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:47 mdb2 Keepalived_vrrp[1168]: (VI_1) Sending/queueing gratuitous ARPs on eth0 for <VIP>
Dec 16 09:16:47 mdb2 Keepalived_vrrp[1168]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:47 mdb2 Keepalived_vrrp[1168]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:47 mdb2 Keepalived_vrrp[1168]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:47 mdb2 Keepalived_vrrp[1168]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:47 mdb2 Keepalived_vrrp[1168]: (VI_1) Received advert from <mdb3> with lower priority 49, ours 50, forcing new election
Dec 16 09:16:47 mdb2 Keepalived_vrrp[1168]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:47 mdb2 Keepalived_vrrp[1168]: (VI_1) Sending/queueing gratuitous ARPs on eth0 for <VIP>
Dec 16 09:16:47 mdb2 Keepalived_vrrp[1168]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:47 mdb2 Keepalived_vrrp[1168]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:47 mdb2 Keepalived_vrrp[1168]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:47 mdb2 Keepalived_vrrp[1168]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:52 mdb2 Keepalived_vrrp[1168]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:52 mdb2 Keepalived_vrrp[1168]: (VI_1) Sending/queueing gratuitous ARPs on eth0 for <VIP>
Dec 16 09:16:52 mdb2 Keepalived_vrrp[1168]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:52 mdb2 Keepalived_vrrp[1168]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:52 mdb2 Keepalived_vrrp[1168]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:52 mdb2 Keepalived_vrrp[1168]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:52 mdb2 Keepalived_vrrp[1168]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:52 mdb2 Keepalived_vrrp[1168]: (VI_1) Sending/queueing gratuitous ARPs on eth0 for <VIP>
Dec 16 09:16:52 mdb2 Keepalived_vrrp[1168]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:52 mdb2 Keepalived_vrrp[1168]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:52 mdb2 Keepalived_vrrp[1168]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:52 mdb2 Keepalived_vrrp[1168]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb2 Keepalived_vrrp[1168]: (VI_1) Master received advert from <mdb1> with higher priority 150, ours 50
Dec 16 09:16:54 mdb2 Keepalived_vrrp[1168]: (VI_1) Entering BACKUP STATE
Dec 16 09:16:54 mdb2 Keepalived_vrrp[1168]: (VI_1) removing VIPs.

So far, so good. Nothing special.

In that moment we have:

mdb1 with no connectivity and thinks is MASTER. mdb2 as the official MASTER. mdb3 as the BACKUP.

Some seconds later, at 09:16:54, server mdb1 recovered network connectivity and received an advert from a MASTER server, and as you can see the priority is 49 which means that it was mdb3 sending adverts?

First part of the log from mdb1 (IPs substituted by <names>):

Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: (VI_1) Sending/queueing gratuitous ARPs on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: (VI_1) Received advert from <mdb3> with lower priority 49, ours 150, forcing new election
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: (VI_1) Sending/queueing gratuitous ARPs on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: (VI_1) Received advert from <mdb2> with lower priority 50, ours 150, forcing new election
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: (VI_1) Sending/queueing gratuitous ARPs on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: (VI_1) Received advert from <mdb2> with lower priority 50, ours 150, forcing new election
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: (VI_1) Sending/queueing gratuitous ARPs on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: (VI_1) Received advert from <mdb2> with lower priority 50, ours 150, forcing new election
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: (VI_1) Sending/queueing gratuitous ARPs on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: (VI_1) Received advert from <mdb2> with lower priority 50, ours 150, forcing new election
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: (VI_1) Sending/queueing gratuitous ARPs on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: (VI_1) Received advert from <mdb2> with lower priority 50, ours 150, forcing new election
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: (VI_1) Sending/queueing gratuitous ARPs on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: (VI_1) Received advert from <mdb2> with lower priority 50, ours 150, forcing new election
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: (VI_1) Sending/queueing gratuitous ARPs on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: (VI_1) Received advert from <mdb2> with lower priority 50, ours 150, forcing new election

Questions

How can be possible that server mdb3 was sending adverts if it switched to BACKUP at 09:16:47?
Why server mdb1 with a priority 150 needs so many elections to switch from BACKUP to MASTER?

This is specially annoying since on every new lower priority received, the server runs the notify_master_rx_lower_pri script which in turn resets the virtual (floating) IP excessive times.

Last part of the log from mdb1, mostly irrelevant (IPs substituted by <names>):

Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: (VI_1) Sending/queueing gratuitous ARPs on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:54 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: (VI_1) Sending/queueing gratuitous ARPs on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: (VI_1) Sending/queueing gratuitous ARPs on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: (VI_1) Sending/queueing gratuitous ARPs on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: (VI_1) Sending/queueing gratuitous ARPs on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: (VI_1) Sending/queueing gratuitous ARPs on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: (VI_1) Sending/queueing gratuitous ARPs on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: (VI_1) Sending/queueing gratuitous ARPs on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: (VI_1) Sending/queueing gratuitous ARPs on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: (VI_1) Sending/queueing gratuitous ARPs on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>
Dec 16 09:16:59 mdb1 Keepalived_vrrp[1175]: Sending gratuitous ARP on eth0 for <VIP>

To Reproduce Any steps necessary to reproduce the behaviour:

Expected behavior I expected to have only one election between two servers and the one with greater priority so win clearly.

Keepalived version

Keepalived v2.0.19 (10/19,2019)

Copyright(C) 2001-2019 Alexandre Cassen, <acassen@gmail.com>

Built with kernel headers for Linux 3.10.0
Running on Linux 3.10.0-1062.4.3.el7.x86_64 #1 SMP Wed Nov 13 23:58:53 UTC 2019

configure options: --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --enable-snmp build_alias=x86_64-redhat-linux-gnu host_alias=x86_64-redhat-linux-gnu PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig CFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic LDFLAGS=-Wl,-z,relro 

Config options:  LVS VRRP VRRP_AUTH OLD_CHKSUM_COMPAT FIB_ROUTING SNMP_VRRP SNMP_CHECKER

Distro (please complete the following information):

Name CentOS
Version 7
Architecture x86_64

Details of any containerisation or hosted service (e.g. AWS) Hetzner cloud servers.

Configuration file: Configuration file for server mdb1:

global_defs {
   router_id HAP1_L2
   script_user root root
   enable_script_security off
}

vrrp_script chk_haproxy {
   script "/usr/sbin/pidof haproxy"
   interval 2
}

vrrp_instance VI_1 {
   interface eth0
   virtual_router_id 1
   priority 150
   advert_int 1
   nopreempt

   unicast_src_ip <mdb1>
   unicast_peer {
      <mdb2>
      <mdb3>
   }

   virtual_ipaddress {
      <VIP>
   }
   track_script {
     chk_haproxy
   }

   notify_master "/etc/keepalived/change_to.sh master"
   notify_master_rx_lower_pri "/etc/keepalived/change_to.sh master rx_lower_pri"

   notify_backup "/etc/keepalived/change_to.sh backup"
   notify_fault "/etc/keepalived/change_to.sh fault"
}

Configuration file for server mdb2:

global_defs {
   router_id HAP2_L2
   script_user root root
   enable_script_security off
}

vrrp_script chk_haproxy {
   script "/usr/sbin/pidof haproxy"
   interval 2
}

vrrp_instance VI_1 {
   state BACKUP
   interface eth0
   virtual_router_id 1
   priority 50
   advert_int 1

   unicast_src_ip <mdb2>
   unicast_peer {
      <mdb1>
      <mdb3>
   }

   virtual_ipaddress {
      <VIP>
   }
   track_script {
     chk_haproxy
   }

   notify_master "/etc/keepalived/change_to.sh master"
   notify_master_rx_lower_pri "/etc/keepalived/change_to.sh master rx_lower_pri"

   notify_backup "/etc/keepalived/change_to.sh backup"
   notify_fault "/etc/keepalived/change_to.sh fault"
}

Configuration file for server mdb3: The same as mdb2 changing only router_id HAP3_L2 and priority 49, and of course, the IPs in the unicast_src and unicast_peer options.

pqarmitage commented 4 years ago

As you identify, there was a network problem with mdb1. It looks to me also as though VRRP adverts send from mdb2 and mdb3 to mdb1 have been queued while mdb2 and mdb3 are unable to communicate with mdb1.

Answers to Questions

How can be possible that server mdb3 was sending adverts if it switched to BACKUP at 09:16:47? At 09:16:47 mdb3 became master and sent and advert to both mdb1 and mdb2. The advert to mdb1 was unable to be delivered at that time and it was queued somewhere. When network connectivity to mdb1 was restored, the queued packet was delivered, so at 09:16:54 keepalived on mdb1 received the advert sent by mdb3 at 09:15:47.
Why server mdb1 with a priority 150 needs so many elections to switch from BACKUP to MASTER? As above, the VRRP adverts from mdb2 to mdb1 were queued while mdb2 could not communicate with mdb1. When communications were re-established, the adverts send by mdb2 at 09:16:47, 09:16:48, 09:16:40, 09:16:50, 09:16:51, 09:16:52 and 09:16:53 were received by keepalived on mdb1 at 09:16:54.

By default, whenever keepalived receives a lower priority advert it send gratuitous ARP messages to ensure that the ARP caches on other systems/switches have the correct MAC address for the VIPs (if not using VMACs), and when using VMACs that switches update their caches to have the correct port for the virtual MAC address(es). Also by default when gratuitous ARP messages are send, a second block is sent 5 seconds later, which is what the log entries at 09:16:59 relate to.

What you are seeing in the logs is not 8 elections to be master, in fact mdb1 remains a master throughout, since until 09:16:54 it doesn't see any adverts, and then at 09:16:54 it sees multiple adverts, and since they are all lower priority it simply send ARP messages to ensure the ARP caches are correct.

One question is why does mdb3 take over as master before mdb2. Since the difference in priority between mdb2 and mdb3 is only 1, the time difference between when mdb2 and mdb3 would take over as master is 1/256 second, i.e. about 4ms. With delays in the network which you appear to have, it is possible that the last advert mdb2 received from mdb1 (at 09:16:44) was delayed fractionally, and so at 09:16:47 it timed out fractionally after mdb3 timed out. If you adjust the priorities so that mdb1 is 250, mdb2 is 125 and mdb3 is 1, then you would have more certainty that mdb2 would take over before mdb3 (you could even have priorities 254, 253 and 1 with your particular configuration).

mikaku commented 4 years ago

It looks to me also as though VRRP adverts send from mdb2 and mdb3 to mdb1 have been queued while mdb2 and mdb3 are unable to communicate with mdb1.

Ahh, that explains it all. :smiley:

And regarding why mdb3 takes over as master before mdb2, was always a constant question in me, now it is also clear. I'll follow your recommendations on the priorities values.

Thank you very much!

acassen / keepalived

Why so many elections to become MASTER? #1467

Questions

Answers to Questions