acassen / keepalived

Keepalived
https://www.keepalived.org
GNU General Public License v2.0
3.96k stars 736 forks source link

nopreemptive non functional when vrrp_script succeeds after initial failure. #1652

Closed noci2012 closed 4 years ago

noci2012 commented 4 years ago

Describe the bug nopreemptive doesn't work as advertised... Even if there is a MASTER, a BACKUP will immediately become master when check_script succeeds.

To Reproduce All systems have the SAME config (3 systems). They are equal, and SHOULD not preempt. Application reads the fifo to activate other parts of its processing. And responds to BACKUP/FAULT by restarting, STOP by stopping successfull, and MASTER by becoming active. Without Applicatie there is no production work hence the check script.

The Keepalived.service file has an extra lines added Wants=network-online.target
And StartExecPre=/bin/sleep 30 Both to be sure the network is stable available before starting communications. (without the sleep, the nopreempt sometimes failed, according other issues here).

Expected behavior If there is an Elected master then not become master.

Keepalived version

Keepalived v1.4.0 (unknown)

Copyright(C) 2001-2018 Alexandre Cassen, <acassen@gmail.com>

Built with kernel headers for Linux 3.10.0
Running on Linux 3.10.0-1062.12.1.el7.x86_64 #1 SMP Thu Dec 12 06:44:49 EST 2019

Build options:  PIPE2 LIBNL3 RTA_ENCAP RTA_EXPIRES FRA_OIFNAME FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK LIBIPTC LIBIPSET_DYNAMIC NET_LINUX_IF_H_COLLISION LVS LIBIPVS_NETLINK VRRP VRRP_AUTH VRRP_VMAC SOCK_NONBLOCK SOCK_CLOEXEC O_PATH GLOB_BRACE OLD_CHKSUM_COMPAT FIB_ROUTING INET6_ADDR_GEN_MODE SNMP_V3_FOR_V2 SNMP SNMP_VRRP SNMP_CHECKER SNMP_RFC SNMP_RFCV2 SNMP_RFCV3 SO_MARK

Distro (please complete the following information):

Configuration file:

global_defs {
    router_id APP_DEVEL
    vrrp_skip_check_adv_addr
    vrrp_garp_interval 0
    vrrp_gna_interval 0
    vrrp_mcast_group4 224.0.0.28 # default
    vrrp_mcast_group6 ff02::22 # default
    enable_traps
    script_user user
    enable_script_security
    vrrp_notify_fifo /tmp/kal-vrrp
}

vrrp_script chk_Applicatie {
    script "/usr/sbin/pidof Applicatie"
    interval 1
    fall 2
    rise 2
}

vrrp_instance AST3_192 {
    state BACKUP
    interface ens192
    virtual_router_id 30
    priority 100
    advert_int 1
    nopreempt
    authentication {
        auth_type PASS
        auth_pass 3333_33
    }
    virtual_ipaddress {
        10.20.139.3
    }
    track_script {
       chk_Applicatie
    }
    unicast_peer {
       10.20.139.13
       10.20.139.16
       10.20.139.19
    }
}

Notify and track scripts NA

System Log entries POC_BSP6 MASTER that becomes Backup:

Jun 25 14:46:44 BSP-POC6 Keepalived_vrrp[4575]: VRRP_Script(chk_Applicatie) succeeded
Jun 25 14:46:45 BSP-POC6 Keepalived_vrrp[4575]: Kernel is reporting: interface ens192 UP
Jun 25 14:46:45 BSP-POC6 Keepalived_vrrp[4575]: VRRP_Instance(AST3_192): Entering BACKUP STATE
Jun 25 14:47:19 BSP-POC6 Keepalived_vrrp[4575]: VRRP_Instance(AST3_192) Transition to MASTER STATE
Jun 25 14:47:19 BSP-POC6 Keepalived_vrrp[4575]: VRRP_Instance(AST3_192) Received advert with lower priority 100, ours 100, forcing new election
Jun 25 14:47:20 BSP-POC6 Keepalived_vrrp[4575]: VRRP_Instance(AST3_192) Entering MASTER STATE
Jun 25 14:47:20 BSP-POC6 Keepalived_vrrp[4575]: VRRP_Instance(AST3_192) setting protocol VIPs.
Jun 25 14:47:20 BSP-POC6 Keepalived_vrrp[4575]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 14:47:20 BSP-POC6 Keepalived_vrrp[4575]: VRRP_Instance(AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 25 14:47:20 BSP-POC6 Keepalived_vrrp[4575]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 14:47:20 BSP-POC6 Keepalived_vrrp[4575]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 14:47:20 BSP-POC6 Keepalived_vrrp[4575]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 14:47:20 BSP-POC6 Keepalived_vrrp[4575]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 14:47:24 BSP-POC6 Keepalived_vrrp[4575]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 14:47:24 BSP-POC6 Keepalived_vrrp[4575]: VRRP_Instance(AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 25 14:47:24 BSP-POC6 Keepalived_vrrp[4575]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 14:47:24 BSP-POC6 Keepalived_vrrp[4575]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 14:47:24 BSP-POC6 Keepalived_vrrp[4575]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 14:47:24 BSP-POC6 Keepalived_vrrp[4575]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 14:47:25 BSP-POC6 Keepalived_vrrp[4575]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 14:47:25 BSP-POC6 Keepalived_vrrp[4575]: VRRP_Instance(AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 25 14:47:25 BSP-POC6 Keepalived_vrrp[4575]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 14:47:25 BSP-POC6 Keepalived_vrrp[4575]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 14:47:25 BSP-POC6 Keepalived_vrrp[4575]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 14:47:25 BSP-POC6 Keepalived_vrrp[4575]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 14:48:12 BSP-POC6 Keepalived_vrrp[4575]: VRRP_Instance(AST3_192) Received advert with higher priority 100, ours 100
Jun 25 14:48:12 BSP-POC6 Keepalived_vrrp[4575]: VRRP_Instance(AST3_192) Entering BACKUP STATE
Jun 25 14:48:12 BSP-POC6 Keepalived_vrrp[4575]: VRRP_Instance(AST3_192) removing protocol VIPs.

POC_BSP7 the machine that intrudes & preempts after CHECK SCRIPT is succeeds

Jun 25 14:48:08 BSP-POC7 Keepalived[1891]: Starting Keepalived v1.4.0 (unknown)
Jun 25 14:48:08 BSP-POC7 Keepalived[1891]: Running on Linux 3.10.0-1062.12.1.el7.x86_64 #1 SMP Thu Dec 12 06:44:49 EST 2019 (built for Linux 3.10.0)
Jun 25 14:48:08 BSP-POC7 Keepalived[1891]: Opening file '/etc/keepalived/keepalived.conf'.
Jun 25 14:48:08 BSP-POC7 Keepalived[1893]: Starting Healthcheck child process, pid=1894
Jun 25 14:48:08 BSP-POC7 Keepalived[1893]: Starting VRRP child process, pid=1895
Jun 25 14:48:08 BSP-POC7 Keepalived_vrrp[1895]: Registering Kernel netlink reflector
Jun 25 14:48:08 BSP-POC7 Keepalived_vrrp[1895]: Registering Kernel netlink command channel
Jun 25 14:48:08 BSP-POC7 Keepalived_vrrp[1895]: Registering gratuitous ARP shared channel
Jun 25 14:48:08 BSP-POC7 Keepalived_vrrp[1895]: Opening file '/etc/keepalived/keepalived.conf'.
Jun 25 14:48:08 BSP-POC7 Keepalived_vrrp[1895]: VRRP_Instance(AST3_192) removing protocol VIPs.
Jun 25 14:48:08 BSP-POC7 Keepalived_vrrp[1895]: Using LinkWatch kernel netlink reflector...
Jun 25 14:48:08 BSP-POC7 Keepalived_vrrp[1895]: VRRP_Instance(AST3_192) Entering BACKUP STATE
Jun 25 14:48:08 BSP-POC7 Keepalived_vrrp[1895]: VRRP sockpool: [ifindex(2), proto(112), unicast(1), fd(11,12)]
Jun 25 14:48:08 BSP-POC7 Keepalived_healthcheckers[1894]: Opening file '/etc/keepalived/keepalived.conf'.
Jun 25 14:48:08 BSP-POC7 Keepalived_healthcheckers[1894]: Unknown keyword 'debug'
Jun 25 14:48:08 BSP-POC7 Keepalived_vrrp[1895]: VRRP_Script(chk_Applicatie) succeeded
Jun 25 14:48:12 BSP-POC7 Keepalived_vrrp[1895]: VRRP_Instance(AST3_192) Transition to MASTER STATE
Jun 25 14:48:13 BSP-POC7 Keepalived_vrrp[1895]: VRRP_Instance(AST3_192) Entering MASTER STATE
Jun 25 14:48:13 BSP-POC7 Keepalived_vrrp[1895]: VRRP_Instance(AST3_192) setting protocol VIPs.
Jun 25 14:48:13 BSP-POC7 Keepalived_vrrp[1895]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 14:48:13 BSP-POC7 Keepalived_vrrp[1895]: VRRP_Instance(AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 25 14:48:13 BSP-POC7 Keepalived_vrrp[1895]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 14:48:13 BSP-POC7 Keepalived_vrrp[1895]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 14:48:13 BSP-POC7 Keepalived_vrrp[1895]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 14:48:13 BSP-POC7 Keepalived_vrrp[1895]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 14:48:18 BSP-POC7 Keepalived_vrrp[1895]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 14:48:18 BSP-POC7 Keepalived_vrrp[1895]: VRRP_Instance(AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 25 14:48:18 BSP-POC7 Keepalived_vrrp[1895]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 14:48:18 BSP-POC7 Keepalived_vrrp[1895]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 14:48:18 BSP-POC7 Keepalived_vrrp[1895]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 14:48:18 BSP-POC7 Keepalived_vrrp[1895]: Sending gratuitous ARP on ens192 for 10.20.139.3

Did keepalived coredump? No

Additional context The moment the application check activates the takeover is done even when another machine IS master at the time.

IF during startup of keep alived Applicatie is running then nopreempt works as advertised. (POC_BSP3 has the highest address).

noci2012 commented 4 years ago

Actually i doubt the problem is iwth checkscript. with check_script removed from the vrrp_instance the machine with highest address preempts anyway. or how can the next be explained: preemption SHOULD not occur if there is a master elsewhere. (Applicatie is quite heavy if takeover is needed, so frivolous cutovers should be avoided).

Jun 25 15:44:21 BSP-POC7 Keepalived[1881]: Starting Healthcheck child process, pid=1882
Jun 25 15:44:21 BSP-POC7 Keepalived[1881]: Starting VRRP child process, pid=1883
Jun 25 15:44:21 BSP-POC7 Keepalived_healthcheckers[1882]: Opening file '/etc/keepalived/keepalived.conf'.
Jun 25 15:44:21 BSP-POC7 Keepalived_vrrp[1883]: Registering Kernel netlink reflector
Jun 25 15:44:21 BSP-POC7 Keepalived_vrrp[1883]: Registering Kernel netlink command channel
Jun 25 15:44:21 BSP-POC7 Keepalived_vrrp[1883]: Registering gratuitous ARP shared channel
Jun 25 15:44:21 BSP-POC7 Keepalived_vrrp[1883]: Opening file '/etc/keepalived/keepalived.conf'.
Jun 25 15:44:21 BSP-POC7 Keepalived_vrrp[1883]: VRRP_Instance(AST3_192) removing protocol VIPs.
Jun 25 15:44:21 BSP-POC7 Keepalived_vrrp[1883]: Using LinkWatch kernel netlink reflector...
Jun 25 15:44:21 BSP-POC7 Keepalived_vrrp[1883]: VRRP_Instance(AST3_192) Entering BACKUP STATE
Jun 25 15:44:21 BSP-POC7 Keepalived_vrrp[1883]: VRRP sockpool: [ifindex(2), proto(112), unicast(1), fd(11,12)]
Jun 25 15:44:24 BSP-POC7 Keepalived_vrrp[1883]: VRRP_Instance(AST3_192) Transition to MASTER STATE
Jun 25 15:44:25 BSP-POC7 Keepalived_vrrp[1883]: VRRP_Instance(AST3_192) Entering MASTER STATE
Jun 25 15:44:25 BSP-POC7 Keepalived_vrrp[1883]: VRRP_Instance(AST3_192) setting protocol VIPs.
Jun 25 15:44:25 BSP-POC7 Keepalived_vrrp[1883]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 15:44:25 BSP-POC7 Keepalived_vrrp[1883]: VRRP_Instance(AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 25 15:44:25 BSP-POC7 Keepalived_vrrp[1883]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 15:44:25 BSP-POC7 Keepalived_vrrp[1883]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 15:44:25 BSP-POC7 Keepalived_vrrp[1883]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 15:44:25 BSP-POC7 Keepalived_vrrp[1883]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 15:44:30 BSP-POC7 Keepalived_vrrp[1883]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 15:44:30 BSP-POC7 Keepalived_vrrp[1883]: VRRP_Instance(AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 25 15:44:30 BSP-POC7 Keepalived_vrrp[1883]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 15:44:30 BSP-POC7 Keepalived_vrrp[1883]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 15:44:30 BSP-POC7 Keepalived_vrrp[1883]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 15:44:30 BSP-POC7 Keepalived_vrrp[1883]: Sending gratuitous ARP on ens192 for 10.20.139.3
noci2012 commented 4 years ago

An attempt with 2.10 does seem to work as expected. (plain restart of keepalived).

Jun 25 23:08:59 BSP-POC7 Keepalived_vrrp[2811]: Registering Kernel netlink command channel
Jun 25 23:08:59 BSP-POC7 Keepalived_vrrp[2811]: Opening file '/etc/keepalived/keepalived.conf'.
Jun 25 23:08:59 BSP-POC7 Keepalived_vrrp[2811]: Assigned address 10.20.139.19 for interface ens192
Jun 25 23:08:59 BSP-POC7 Keepalived_vrrp[2811]: Assigned address fe80::250:56ff:fe9d:75b8 for interface ens192
Jun 25 23:08:59 BSP-POC7 Keepalived_vrrp[2811]: Registering gratuitous ARP shared channel
Jun 25 23:08:59 BSP-POC7 Keepalived_vrrp[2811]: (AST3_192) removing VIPs.
Jun 25 23:08:59 BSP-POC7 Keepalived_vrrp[2811]: VRRP sockpool: [ifindex(2), family(IPv4), proto(112), unicast(1), fd(11,12)]
Jun 25 23:08:59 BSP-POC7 Keepalived_vrrp[2811]: Script `chk_Applicatie` now returning 1
Jun 25 23:08:59 BSP-POC7 Keepalived_vrrp[2811]: VRRP_Script(chk_Applicatie) failed (exited with status 1)
Jun 25 23:08:59 BSP-POC7 Keepalived_vrrp[2811]: (AST3_192) Entering FAULT STATE
Jun 25 23:09:18 BSP-POC7 Keepalived_vrrp[2811]: Script `chk_Applicatie` now returning 0
Jun 25 23:09:19 BSP-POC7 Keepalived_vrrp[2811]: VRRP_Script(chk_Applicatie) succeeded
Jun 25 23:09:19 BSP-POC7 Keepalived_vrrp[2811]: (AST3_192) Entering BACKUP STATE
Jun 25 23:22:37 BSP-POC7 Keepalived[1303]: (Line 15) vrrp_garp_interval '0' is invalid
Jun 25 23:22:37 BSP-POC7 Keepalived[1303]: (Line 16) number '0' outside range [1e-06, 4294]
Jun 25 23:22:37 BSP-POC7 Keepalived[1303]: (Line 16) vrrp_gna_interval '0' is invalid
Jun 25 23:22:37 BSP-POC7 Keepalived[1322]: Starting VRRP child process, pid=1324
Jun 25 23:22:37 BSP-POC7 Keepalived_vrrp[1324]: Registering Kernel netlink reflector
Jun 25 23:22:37 BSP-POC7 Keepalived_vrrp[1324]: Registering Kernel netlink command channel
Jun 25 23:22:37 BSP-POC7 Keepalived_vrrp[1324]: Opening file '/etc/keepalived/keepalived.conf'.
Jun 25 23:22:37 BSP-POC7 Keepalived_vrrp[1324]: Assigned address 10.20.139.19 for interface ens192
Jun 25 23:22:37 BSP-POC7 Keepalived_vrrp[1324]: Assigned address fe80::250:56ff:fe9d:75b8 for interface ens192
Jun 25 23:22:37 BSP-POC7 Keepalived_vrrp[1324]: Registering gratuitous ARP shared channel
Jun 25 23:22:37 BSP-POC7 Keepalived_vrrp[1324]: (AST3_192) removing VIPs.
Jun 25 23:22:37 BSP-POC7 Keepalived_vrrp[1324]: VRRP sockpool: [ifindex(2), family(IPv4), proto(112), unicast(1), fd(11,12)]
Jun 25 23:22:37 BSP-POC7 Keepalived_vrrp[1324]: Script `chk_Applicatie` now returning 1
Jun 25 23:22:37 BSP-POC7 Keepalived_vrrp[1324]: VRRP_Script(chk_Applicatie) failed (exited with status 1)
Jun 25 23:22:37 BSP-POC7 Keepalived_vrrp[1324]: (AST3_192) Entering FAULT STATE
Jun 25 23:22:57 BSP-POC7 Keepalived_vrrp[1324]: Script `chk_Applicatie` now returning 0
Jun 25 23:22:58 BSP-POC7 Keepalived_vrrp[1324]: VRRP_Script(chk_Applicatie) succeeded
Jun 25 23:22:58 BSP-POC7 Keepalived_vrrp[1324]: (AST3_192) Entering BACKUP STATE
Jun 25 23:23:02 BSP-POC7 Keepalived_vrrp[1324]: (AST3_192) Receive advertisement timeout
Jun 25 23:23:02 BSP-POC7 Keepalived_vrrp[1324]: (AST3_192) Entering MASTER STATE
Jun 25 23:23:02 BSP-POC7 Keepalived_vrrp[1324]: (AST3_192) setting VIPs.
Jun 25 23:23:02 BSP-POC7 Keepalived_vrrp[1324]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:23:02 BSP-POC7 Keepalived_vrrp[1324]: (AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 25 23:23:02 BSP-POC7 Keepalived_vrrp[1324]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:23:02 BSP-POC7 Keepalived_vrrp[1324]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:23:02 BSP-POC7 Keepalived_vrrp[1324]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:23:02 BSP-POC7 Keepalived_vrrp[1324]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:23:07 BSP-POC7 Keepalived_vrrp[1324]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:23:07 BSP-POC7 Keepalived_vrrp[1324]: (AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 25 23:23:07 BSP-POC7 Keepalived_vrrp[1324]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:23:07 BSP-POC7 Keepalived_vrrp[1324]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:23:07 BSP-POC7 Keepalived_vrrp[1324]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:23:07 BSP-POC7 Keepalived_vrrp[1324]: Sending gratuitous ARP on ens192 for 10.20.139.3

running tshark during boot is a bit difficult:

Neghbour system:

Jun 25 23:22:24 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:22:24 BSP-POC6 Keepalived_vrrp[23071]: (AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 25 23:22:24 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:22:24 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:22:24 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:22:24 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:22:24 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:22:24 BSP-POC6 Keepalived_vrrp[23071]: (AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 25 23:22:24 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:22:24 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:22:24 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:22:24 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:23:02 BSP-POC6 Keepalived_vrrp[23071]: (AST3_192) Master received advert from 10.20.139.19 with same priority 100 but higher IP address than ours
Jun 25 23:23:02 BSP-POC6 Keepalived_vrrp[23071]: (AST3_192) Entering BACKUP STATE
Jun 25 23:23:02 BSP-POC6 Keepalived_vrrp[23071]: (AST3_192) removing VIPs.
noci2012 commented 4 years ago

Again with network dump current master: (10.20.139.16)

un 25 23:29:04 BSP-POC6 Keepalived_vrrp[23071]: (AST3_192) Backup received priority 0 advertisement
Jun 25 23:29:05 BSP-POC6 Keepalived_vrrp[23071]: (AST3_192) Receive advertisement timeout
Jun 25 23:29:05 BSP-POC6 Keepalived_vrrp[23071]: (AST3_192) Entering MASTER STATE
Jun 25 23:29:05 BSP-POC6 Keepalived_vrrp[23071]: (AST3_192) setting VIPs.
Jun 25 23:29:05 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:05 BSP-POC6 Keepalived_vrrp[23071]: (AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 25 23:29:05 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:05 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:05 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:05 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:05 BSP-POC6 Keepalived_vrrp[23071]: (AST3_192) Received advert from 10.20.139.13 with lower priority 100, ours 100, forcing new election
Jun 25 23:29:05 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:05 BSP-POC6 Keepalived_vrrp[23071]: (AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 25 23:29:05 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:05 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:05 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:05 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:10 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:10 BSP-POC6 Keepalived_vrrp[23071]: (AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 25 23:29:10 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:10 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:10 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:10 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:10 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:10 BSP-POC6 Keepalived_vrrp[23071]: (AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 25 23:29:10 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:10 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:10 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:10 BSP-POC6 Keepalived_vrrp[23071]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:48 BSP-POC6 Keepalived_vrrp[23071]: (AST3_192) Master received advert from 10.20.139.19 with same priority 100 but higher IP address than ours
Jun 25 23:29:48 BSP-POC6 Keepalived_vrrp[23071]: (AST3_192) Entering BACKUP STATE
Jun 25 23:29:48 BSP-POC6 Keepalived_vrrp[23071]: (AST3_192) removing VIPs.

Rebooted system: (10.20.139.19)

Jun 25 23:29:23 BSP-POC7 Keepalived_vrrp[1326]: Registering Kernel netlink reflector
Jun 25 23:29:23 BSP-POC7 Keepalived_vrrp[1326]: Registering Kernel netlink command channel
Jun 25 23:29:23 BSP-POC7 Keepalived_vrrp[1326]: Opening file '/etc/keepalived/keepalived.conf'.
Jun 25 23:29:23 BSP-POC7 Keepalived_vrrp[1326]: Assigned address 10.20.139.19 for interface ens192
Jun 25 23:29:23 BSP-POC7 Keepalived_vrrp[1326]: Assigned address fe80::250:56ff:fe9d:75b8 for interface ens192
Jun 25 23:29:23 BSP-POC7 Keepalived_vrrp[1326]: Registering gratuitous ARP shared channel
Jun 25 23:29:23 BSP-POC7 Keepalived_vrrp[1326]: (AST3_192) removing VIPs.
Jun 25 23:29:23 BSP-POC7 Keepalived_vrrp[1326]: VRRP sockpool: [ifindex(2), family(IPv4), proto(112), unicast(1), fd(11,12)]
Jun 25 23:29:23 BSP-POC7 Keepalived_vrrp[1326]: Script `chk_Applicatie` now returning 1
Jun 25 23:29:23 BSP-POC7 Keepalived_vrrp[1326]: VRRP_Script(chk_Applicatie) failed (exited with status 1)
Jun 25 23:29:23 BSP-POC7 Keepalived_vrrp[1326]: (AST3_192) Entering FAULT STATE
Jun 25 23:29:43 BSP-POC7 Keepalived_vrrp[1326]: Script `chk_Applicatie` now returning 0
Jun 25 23:29:44 BSP-POC7 Keepalived_vrrp[1326]: VRRP_Script(chk_Applicatie) succeeded
Jun 25 23:29:44 BSP-POC7 Keepalived_vrrp[1326]: (AST3_192) Entering BACKUP STATE
Jun 25 23:29:48 BSP-POC7 Keepalived_vrrp[1326]: (AST3_192) Receive advertisement timeout
Jun 25 23:29:48 BSP-POC7 Keepalived_vrrp[1326]: (AST3_192) Entering MASTER STATE
Jun 25 23:29:48 BSP-POC7 Keepalived_vrrp[1326]: (AST3_192) setting VIPs.
Jun 25 23:29:48 BSP-POC7 Keepalived_vrrp[1326]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:48 BSP-POC7 Keepalived_vrrp[1326]: (AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 25 23:29:48 BSP-POC7 Keepalived_vrrp[1326]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:48 BSP-POC7 Keepalived_vrrp[1326]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:48 BSP-POC7 Keepalived_vrrp[1326]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:48 BSP-POC7 Keepalived_vrrp[1326]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:53 BSP-POC7 Keepalived_vrrp[1326]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:53 BSP-POC7 Keepalived_vrrp[1326]: (AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 25 23:29:53 BSP-POC7 Keepalived_vrrp[1326]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:53 BSP-POC7 Keepalived_vrrp[1326]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:53 BSP-POC7 Keepalived_vrrp[1326]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 25 23:29:53 BSP-POC7 Keepalived_vrrp[1326]: Sending gratuitous ARP on ens192 for 10.20.139.3

tshark dump:

  1 23:28:57.249070761 10.20.139.19 -> 10.20.139.16 VRRP 60 Announcement (v2)
  2 23:28:58.249434801 10.20.139.19 -> 10.20.139.16 VRRP 60 Announcement (v2)
  3 23:28:59.249721643 10.20.139.19 -> 10.20.139.16 VRRP 60 Announcement (v2)
  4 23:29:00.249846944 10.20.139.19 -> 10.20.139.16 VRRP 60 Announcement (v2)
  5 23:29:01.250014396 10.20.139.19 -> 10.20.139.16 VRRP 60 Announcement (v2)
  6 23:29:02.250352761 10.20.139.19 -> 10.20.139.16 VRRP 60 Announcement (v2)
  7 23:29:03.250528299 10.20.139.19 -> 10.20.139.16 VRRP 60 Announcement (v2)
  8 23:29:04.250691932 10.20.139.19 -> 10.20.139.16 VRRP 60 Announcement (v2)
  9 23:29:04.987579747 10.20.139.19 -> 10.20.139.16 VRRP 60 Announcement (v2)
 10 23:29:05.597801165 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 11 23:29:05.597853674 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 12 23:29:05.602669622 10.20.139.13 -> 10.20.139.16 VRRP 60 Announcement (v2)
 13 23:29:05.602758026 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 14 23:29:05.602794826 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 15 23:29:06.603006753 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 16 23:29:06.603065284 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 17 23:29:07.603240904 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 18 23:29:07.603297343 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 19 23:29:08.603417147 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 20 23:29:08.603517129 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 21 23:29:09.603653117 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 22 23:29:09.603710566 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 23 23:29:10.603862076 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 24 23:29:10.603915320 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 25 23:29:11.604045915 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 26 23:29:11.604105419 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 27 23:29:12.604253931 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 28 23:29:12.604315433 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 29 23:29:13.604471963 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 30 23:29:13.604561540 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 31 23:29:14.604689464 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 32 23:29:15.604866849 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 33 23:29:16.605038549 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 34 23:29:17.605345584 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 35 23:29:18.605578585 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 36 23:29:19.605760060 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 37 23:29:20.606010788 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 38 23:29:21.606244801 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 39 23:29:22.606428698 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 40 23:29:23.353035018 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 41 23:29:23.606672076 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 42 23:29:23.606725646 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 43 23:29:24.606859904 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 44 23:29:24.606910403 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 45 23:29:25.607013596 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 46 23:29:25.607064298 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 47 23:29:26.607154884 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 48 23:29:26.607216983 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 49 23:29:27.607310664 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 50 23:29:27.607368299 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 51 23:29:28.607484874 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 52 23:29:28.607543924 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 53 23:29:29.607656682 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 54 23:29:29.607715434 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 55 23:29:30.607839292 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 56 23:29:30.607894489 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 57 23:29:31.608003163 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 58 23:29:31.608061671 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 59 23:29:32.608175830 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 60 23:29:32.608235265 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 61 23:29:33.608341848 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 62 23:29:33.608399534 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 63 23:29:34.608596552 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 64 23:29:34.608714305 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 65 23:29:35.608827865 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 66 23:29:35.608933466 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 67 23:29:36.609073866 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 68 23:29:36.609144057 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 69 23:29:37.609279765 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 70 23:29:37.609340519 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 71 23:29:38.609493211 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 72 23:29:38.609554982 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 73 23:29:39.609712767 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 74 23:29:39.609860799 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 75 23:29:40.610018630 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 76 23:29:40.610079303 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 77 23:29:41.610214194 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 78 23:29:41.610275020 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 79 23:29:42.610463520 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 80 23:29:42.610591083 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 81 23:29:43.610737236 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 82 23:29:43.610800031 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 83 23:29:44.610967774 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 84 23:29:44.611062486 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 85 23:29:45.611236161 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 86 23:29:45.611297874 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 87 23:29:46.611488094 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 88 23:29:46.611590755 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 89 23:29:47.611785984 10.20.139.16 -> 10.20.139.13 VRRP 54 Announcement (v2)
 90 23:29:47.611890014 10.20.139.16 -> 10.20.139.19 VRRP 54 Announcement (v2)
 91 23:29:48.088812344 10.20.139.19 -> 10.20.139.16 VRRP 60 Announcement (v2)
 92 23:29:49.087063867 10.20.139.19 -> 10.20.139.16 VRRP 60 Announcement (v2)
 93 23:29:50.087269955 10.20.139.19 -> 10.20.139.16 VRRP 60 Announcement (v2)
 94 23:29:51.087528398 10.20.139.19 -> 10.20.139.16 VRRP 60 Announcement (v2)
 95 23:29:52.087855514 10.20.139.19 -> 10.20.139.16 VRRP 60 Announcement (v2)
 96 23:29:53.091471860 10.20.139.19 -> 10.20.139.16 VRRP 60 Announcement (v2)
 97 23:29:54.090354964 10.20.139.19 -> 10.20.139.16 VRRP 60 Announcement (v2)
 98 23:29:55.090653988 10.20.139.19 -> 10.20.139.16 VRRP 60 Announcement (v2)
 99 23:29:56.090764757 10.20.139.19 -> 10.20.139.16 VRRP 60 Announcement (v2)
100 23:29:57.091110734 10.20.139.19 -> 10.20.139.16 VRRP 60 Announcement (v2)
101 23:29:58.091442621 10.20.139.19 -> 10.20.139.16 VRRP 60 Announcement (v2)
102 23:29:59.091641657 10.20.139.19 -> 10.20.139.16 VRRP 60 Announcement (v2)
103 23:30:00.091763755 10.20.139.19 -> 10.20.139.16 VRRP 60 Announcement (v2)
104 23:30:01.092117383 10.20.139.19 -> 10.20.139.16 VRRP 60 Announcement (v2)
105 23:30:02.092266339 10.20.139.19 -> 10.20.139.16 VRRP 60 Announcement (v2)
106 23:30:03.092644051 10.20.139.19 -> 10.20.139.16 VRRP 60 Announcement (v2)
107 23:30:04.092773516 10.20.139.19 -> 10.20.139.16 VRRP 60 Announcement (v2)

(a third system is also here, just not relevant 10.20.139.13).

pqarmitage commented 4 years ago

keepalived v1.4.0 is very old and there have been approximately 2130 non-merge commits since v1.4.0. It would appear, from your previous comment, that the problem was resolved by the time of v2.0.10.

v2.0.10 is quite old, and there have been over 700 non-merge commits between v2.0.10 and v2.0.20, so you should probably upgrade to v2.0.20 or v2.1.3.

The simplest way to get a more up to date version of keepalived is to build it yourself from source. The file INSTALL in the root of the source tree list what packages need to be installed to build keepalived.

I am now not absolutely clear what the issue is. I think nopreempt is working for you except when you reboot a system. My comments below are made on that basis.

We have seen several reports of problems like this when keepalived is started at boot time, and the problem is caused by the system not receiving data from the network until some time after keepalived has started (sometimes data starts being received, then stops for several seconds and then starts again). If the system is not receiving data from the network, keepalived will obviously not receive adverts and hence will transition to master. This is what appears to be happening here.

In order to provide a workaround for this problem, I added a configuration option vrrp_startup_delay so that keepalived delays starting running the VRRP protocol to allow time for the networking to start working properly. This was added in v2.0.14.

pqarmitage commented 4 years ago

A few of other things I have noticed:

You mention at the beginning that you have 3 systems, but in the unicast_peer block you list 3 IP addresses which means that there is that system itself + 3 remote systems. The alternative is that you have listed the local host address in the unicast_peer block, which is wrong; the local host address must not be listed in the unicast_peer block.

You have stated (a third system is also here, just not relevant 10.20.139.13) but 10.20.139.13 is listed in the unicast peer block.

Adverts from 10.20.139.19 have length 60 but adverts from 10.20.139.16 have length 54. There must be a mismatched configuration between 10.20.139.16 and 10.20.139.19 since the adverts should be the same length.

Your logs show

Jun 25 23:22:37 BSP-POC7 Keepalived[1303]: (Line 15) vrrp_garp_interval '0' is invalid
Jun 25 23:22:37 BSP-POC7 Keepalived[1303]: (Line 16) number '0' outside range [1e-06, 4294]
Jun 25 23:22:37 BSP-POC7 Keepalived[1303]: (Line 16) vrrp_gna_interval '0' is invalid

You should correct these errors.

noci2012 commented 4 years ago

wrt. versions: the available patchsets for RHEL/CENTOS 7 and RHEL/CENTOS 8 have been used to create those (they are from the official & supported releases).

wrt. Difference in packets POCBSP7 has v2.10, POCBSP5 & POCBSP6 have 1.4.0 All systems have the exactly same config and observation of a network trace does show the system does not send to itself...? Is it a problem if keepalived v1.4 run along keepalive v2.10?... The would be a showstopper with upgrades, preparing & cutover if keepalived is one of the elements to be upgraded. (for 24* 365.25 use, with near 0 downtime).

wrt. booting: The keepalived.service script has 1 addition: ExecStartPre=/bin/sleep 30 and network-online.target is specified as well. The system is reachable by ssh and a command can be issued around the time keepalived starts.

wrt. config it is from values in the config from 1.4.0, i will attempt a more recent 2.x later .

pqarmitage commented 4 years ago

@noci2012 You refer to keepalived v2.10 but there is no such version. I had been assuming that you meant v2.0.10, but it could be v2.1.0. Could you please clarify and provide the output of keepalived -v of the version 2.?.? system.

I have tested your configuration with both v1.4.0 and v2.0.10 and tshark reports the packet lengths as 54 bytes, so there is something strange happening with the adverts from 10.20.139.19. Can you please capture both the 54 byte and 60 byte packets with tcpdump and post the output here.

There is no problem with v1.4.0 and v2.0.10 interoperating.

You MUST remove the localhost from the unicast_peer list; having it there means that keepalived receives its own adverts, and the current version of keepalived will log every such packet as an error. If you want to keep the same configuration file on all your systems, you could have configuration like:

    unicast_peer {
@^POC_BSP6       10.20.139.13
@^POC_BSP7       10.20.139.16
@^POC_BSP8       10.20.139.19
    }

assuming the POC_BSP6 is 10.20.139.13 etc.

noci2012 commented 4 years ago

Dump from earlier tshark capture: (tshark -x) it looks like a padding of 6 * 00 bytes. It looks like all transmitted packets are 54, and all received packets are 60.. So this size differences is a red-herring.


0000  00 50 56 9d 75 b8 00 50 56 9d 5d 33 08 00 45 c0   .PV.u..PV.]3..E.
0010  00 28 03 b1 00 00 ff 70 8c a9 0a 14 8b 10 0a 14   .(.....p........
0020  8b 13 21 1e 64 01 01 01 ec 2d 0a 14 8b 03 33 33   ..!.d....-....33
0030  33 33 5f 33 33 00                                 33_33.

0000  00 50 56 9d 4a cb 00 50 56 9d 5d 33 08 00 45 c0   .PV.J..PV.]3..E.
0010  00 28 03 b2 00 00 ff 70 8c ae 0a 14 8b 10 0a 14   .(.....p........
0020  8b 0d 21 1e 64 01 01 01 ec 2d 0a 14 8b 03 33 33   ..!.d....-....33
0030  33 33 5f 33 33 00                                 33_33.

0000  00 50 56 9d 75 b8 00 50 56 9d 5d 33 08 00 45 c0   .PV.u..PV.]3..E.
0010  00 28 03 b2 00 00 ff 70 8c a8 0a 14 8b 10 0a 14   .(.....p........
0020  8b 13 21 1e 64 01 01 01 ec 2d 0a 14 8b 03 33 33   ..!.d....-....33
0030  33 33 5f 33 33 00                                 33_33.

0000  00 50 56 9d 4a cb 00 50 56 9d 5d 33 08 00 45 c0   .PV.J..PV.]3..E.
0010  00 28 03 b3 00 00 ff 70 8c ad 0a 14 8b 10 0a 14   .(.....p........
0020  8b 0d 21 1e 64 01 01 01 ec 2d 0a 14 8b 03 33 33   ..!.d....-....33
0030  33 33 5f 33 33 00                                 33_33.

0000  00 50 56 9d 75 b8 00 50 56 9d 5d 33 08 00 45 c0   .PV.u..PV.]3..E.
0010  00 28 03 b3 00 00 ff 70 8c a7 0a 14 8b 10 0a 14   .(.....p........
0020  8b 13 21 1e 64 01 01 01 ec 2d 0a 14 8b 03 33 33   ..!.d....-....33
0030  33 33 5f 33 33 00                                 33_33.

0000  00 50 56 9d 5d 33 00 50 56 9d 75 b8 08 00 45 c0   .PV.]3.PV.u...E.
0010  00 28 00 01 00 00 ff 70 90 59 0a 14 8b 13 0a 14   .(.....p.Y......
0020  8b 10 21 1e 64 01 01 01 ec 2d 0a 14 8b 03 33 33   ..!.d....-....33
0030  33 33 5f 33 33 00 00 00 00 00 00 00               33_33.......

0000  00 50 56 9d 5d 33 00 50 56 9d 75 b8 08 00 45 c0   .PV.]3.PV.u...E.
0010  00 28 00 02 00 00 ff 70 90 58 0a 14 8b 13 0a 14   .(.....p.X......
0020  8b 10 21 1e 64 01 01 01 ec 2d 0a 14 8b 03 33 33   ..!.d....-....33
0030  33 33 5f 33 33 00 00 00 00 00 00 00               33_33.......

0000  00 50 56 9d 5d 33 00 50 56 9d 75 b8 08 00 45 c0   .PV.]3.PV.u...E.
0010  00 28 00 03 00 00 ff 70 90 57 0a 14 8b 13 0a 14   .(.....p.W......
0020  8b 10 21 1e 64 01 01 01 ec 2d 0a 14 8b 03 33 33   ..!.d....-....33
0030  33 33 5f 33 33 00 00 00 00 00 00 00               33_33.......

0000  00 50 56 9d 5d 33 00 50 56 9d 75 b8 08 00 45 c0   .PV.]3.PV.u...E.
0010  00 28 00 04 00 00 ff 70 90 56 0a 14 8b 13 0a 14   .(.....p.V......
0020  8b 10 21 1e 64 01 01 01 ec 2d 0a 14 8b 03 33 33   ..!.d....-....33
0030  33 33 5f 33 33 00 00 00 00 00 00 00               33_33.......

0000  00 50 56 9d 5d 33 00 50 56 9d 75 b8 08 00 45 c0   .PV.]3.PV.u...E.
0010  00 28 00 05 00 00 ff 70 90 55 0a 14 8b 13 0a 14   .(.....p.U......
0020  8b 10 21 1e 64 01 01 01 ec 2d 0a 14 8b 03 33 33   ..!.d....-....33
0030  33 33 5f 33 33 00 00 00 00 00 00 00               33_33.......

0000  00 50 56 9d 5d 33 00 50 56 9d 75 b8 08 00 45 c0   .PV.]3.PV.u...E.
0010  00 28 00 06 00 00 ff 70 90 54 0a 14 8b 13 0a 14   .(.....p.T......
0020  8b 10 21 1e 64 01 01 01 ec 2d 0a 14 8b 03 33 33   ..!.d....-....33
0030  33 33 5f 33 33 00 00 00 00 00 00 00               33_33.......

0000  00 50 56 9d 5d 33 00 50 56 9d 75 b8 08 00 45 c0   .PV.]3.PV.u...E.
0010  00 28 00 07 00 00 ff 70 90 53 0a 14 8b 13 0a 14   .(.....p.S......
0020  8b 10 21 1e 64 01 01 01 ec 2d 0a 14 8b 03 33 33   ..!.d....-....33
0030  33 33 5f 33 33 00 00 00 00 00 00 00               33_33.......
noci2012 commented 4 years ago

during boot 2.0.20: Jun 26 12:18:31 BLUE3 Keepalived[1306]: Starting Keepalived v2.0.20 (01/22,2020) Jun 26 12:18:31 BLUE3 Keepalived[1306]: Running on Linux 3.10.0-1062.12.1.el7.x86_64 #1 SMP Thu Dec 12 06:44:49 EST 2019 (built for Linux 3.10.0) Jun 26 12:18:31 BLUE3 Keepalived[1306]: Command line: '/usr/sbin/keepalived' '-D' '-i' '$(hostname)' Jun 26 12:18:31 BLUE3 Keepalived[1306]: Opening file '/etc/keepalived/keepalived.conf'. Jun 26 12:18:31 BLUE3 Keepalived[1327]: Starting VRRP child process, pid=1328 Jun 26 12:18:31 BLUE3 Keepalived_vrrp[1328]: Registering Kernel netlink reflector Jun 26 12:18:31 BLUE3 Keepalived_vrrp[1328]: Registering Kernel netlink command channel Jun 26 12:18:31 BLUE3 Keepalived_vrrp[1328]: Opening file '/etc/keepalived/keepalived.conf'. Jun 26 12:18:31 BLUE3 Keepalived_vrrp[1328]: Assigned address 10.20.139.19 for interface ens192 Jun 26 12:18:31 BLUE3 Keepalived_vrrp[1328]: Assigned address fe80::250:56ff:fe9d:75b8 for interface ens192 Jun 26 12:18:31 BLUE3 Keepalived_vrrp[1328]: Registering gratuitous ARP shared channel Jun 26 12:18:31 BLUE3 Keepalived_vrrp[1328]: (AST3_192) removing VIPs. Jun 26 12:18:31 BLUE3 Keepalived_vrrp[1328]: VRRP sockpool: [ifindex(2), family(IPv4), proto(112), unicast(1), fd(12,13)] Jun 26 12:18:31 BLUE3 Keepalived_vrrp[1328]: VRRP_Script(chk_Applicatie) succeeded Jun 26 12:18:31 BLUE3 Keepalived_vrrp[1328]: (AST3_192) Entering BACKUP STATE Jun 26 12:18:31 BLUE3 Keepalived_vrrp[1328]: AST3_192: sending gratuitous ARP for 10.20.139.19 Jun 26 12:18:31 BLUE3 Keepalived_vrrp[1328]: Sending gratuitous ARP on ens192 for 10.20.139.19 Jun 26 12:18:35 BLUE3 Keepalived_vrrp[1328]: (AST3_192) Receive advertisement timeout Jun 26 12:18:35 BLUE3 Keepalived_vrrp[1328]: (AST3_192) Entering MASTER STATE Jun 26 12:18:35 BLUE3 Keepalived_vrrp[1328]: (AST3_192) setting VIPs. Jun 26 12:18:35 BLUE3 Keepalived_vrrp[1328]: Sending gratuitous ARP on ens192 for 10.20.139.3 Jun 26 12:18:35 BLUE3 Keepalived_vrrp[1328]: (AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3 Jun 26 12:18:35 BLUE3 Keepalived_vrrp[1328]: Sending gratuitous ARP on ens192 for 10.20.139.3 Jun 26 12:18:35 BLUE3 Keepalived_vrrp[1328]: Sending gratuitous ARP on ens192 for 10.20.139.3 Jun 26 12:18:35 BLUE3 Keepalived_vrrp[1328]: Sending gratuitous ARP on ens192 for 10.20.139.3 Jun 26 12:18:35 BLUE3 Keepalived_vrrp[1328]: Sending gratuitous ARP on ens192 for 10.20.139.3 Jun 26 12:18:39 BLUE3 Keepalived_vrrp[1328]: (AST3_192) Received advert from 10.20.139.16 with lower priority 100, ours 100, forcing new election Jun 26 12:18:39 BLUE3 Keepalived_vrrp[1328]: Sending gratuitous ARP on ens192 for 10.20.139.3 Jun 26 12:18:39 BLUE3 Keepalived_vrrp[1328]: (AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3 Jun 26 12:18:39 BLUE3 Keepalived_vrrp[1328]: Sending gratuitous ARP on ens192 for 10.20.139.3 Jun 26 12:18:39 BLUE3 Keepalived_vrrp[1328]: Sending gratuitous ARP on ens192 for 10.20.139.3 Jun 26 12:18:39 BLUE3 Keepalived_vrrp[1328]: Sending gratuitous ARP on ens192 for 10.20.139.3 Jun 26 12:18:39 BLUE3 Keepalived_vrrp[1328]: Sending gratuitous ARP on ens192 for 10.20.139.3 Jun 26 12:18:40 BLUE3 Keepalived_vrrp[1328]: Sending gratuitous ARP on ens192 for 10.20.139.3 Jun 26 12:18:40 BLUE3 Keepalived_vrrp[1328]: (AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3 Jun 26 12:18:40 BLUE3 Keepalived_vrrp[1328]: Sending gratuitous ARP on ens192 for 10.20.139.3 Jun 26 12:18:40 BLUE3 Keepalived_vrrp[1328]: Sending gratuitous ARP on ens192 for 10.20.139.3 Jun 26 12:18:40 BLUE3 Keepalived_vrrp[1328]: Sending gratuitous ARP on ens192 for 10.20.139.3 Jun 26 12:18:40 BLUE3 Keepalived_vrrp[1328]: Sending gratuitous ARP on ens192 for 10.20.139.3 Jun 26 12:18:44 BLUE3 Keepalived_vrrp[1328]: Sending gratuitous ARP on ens192 for 10.20.139.3 Jun 26 12:18:44 BLUE3 Keepalived_vrrp[1328]: (AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3 Jun 26 12:18:44 BLUE3 Keepalived_vrrp[1328]: Sending gratuitous ARP on ens192 for 10.20.139.3 Jun 26 12:18:44 BLUE3 Keepalived_vrrp[1328]: Sending gratuitous ARP on ens192 for 10.20.139.3 Jun 26 12:18:44 BLUE3 Keepalived_vrrp[1328]: Sending gratuitous ARP on ens192 for 10.20.139.3 Jun 26 12:18:44 BLUE3 Keepalived_vrrp[1328]: Sending gratuitous ARP on ens192 for 10.20.139.3

(same systems, just renamed for logging in next test round: ) BLUE2 was master when BLUE3 was rebooted. Config:

! Configuration File for keepalived

global_defs {
    router_id LVS_DEVEL
    vrrp_skip_check_adv_addr
    #vrrp_garp_interval 0.1
    #vrrp_gna_interval 0.1
    vrrp_mcast_group4 224.0.0.28 # default
    vrrp_mcast_group6 ff02::22 # default
    enable_traps
    script_user user
    enable_script_security
    vrrp_notify_fifo /tmp/kal-vrrp
}

vrrp_script chk_Applicatie {
    script "/usr/sbin/pidof Applicatie"
    interval 1
    fall 2
    rise 2
}

vrrp_instance AST3_192 {
    state BACKUP
    interface ens192
    virtual_router_id 30
    priority 100
    advert_int 1
    nopreempt
    authentication {
        auth_type PASS
        auth_pass 3333_33
    }
    virtual_ipaddress {
        10.20.139.3
    }
    track_script {
       chk_Applicatie
    }
    @BLUE1      unicast_src_ip 10.20.139.13
    @BLUE2      unicast_src_ip 10.20.139.16
    @BLUE3      unicast_src_ip 10.20.139.19
    unicast_peer {
        @^BLUE1       10.20.139.13
        @^BLUE2       10.20.139.16
        @^BLUE3       10.20.139.19
    }
}

rebooting BLUE2 gives:

Jun 26 12:23:26 BLUE2 Keepalived[1298]: Starting Keepalived v2.0.20 (01/22,2020)
Jun 26 12:23:26 BLUE2 Keepalived[1298]: Running on Linux 3.10.0-1062.12.1.el7.x86_64 #1 SMP Thu Dec 12 06:44:49 EST 2019 (built for Linux 3.10.0)
Jun 26 12:23:26 BLUE2 Keepalived[1298]: Command line: '/usr/sbin/keepalived' '-D' '-i' '$(hostname)'
Jun 26 12:23:26 BLUE2 Keepalived[1298]: Opening file '/etc/keepalived/keepalived.conf'.
Jun 26 12:23:26 BLUE2 Keepalived[1324]: Starting VRRP child process, pid=1326
Jun 26 12:23:26 BLUE2 Keepalived_vrrp[1326]: Registering Kernel netlink reflector
Jun 26 12:23:26 BLUE2 Keepalived_vrrp[1326]: Registering Kernel netlink command channel
Jun 26 12:23:26 BLUE2 Keepalived_vrrp[1326]: Opening file '/etc/keepalived/keepalived.conf'.
Jun 26 12:23:26 BLUE2 Keepalived_vrrp[1326]: Assigned address 10.20.139.16 for interface ens192
Jun 26 12:23:26 BLUE2 Keepalived_vrrp[1326]: Assigned address fe80::250:56ff:fe9d:5d33 for interface ens192
Jun 26 12:23:26 BLUE2 Keepalived_vrrp[1326]: Registering gratuitous ARP shared channel
Jun 26 12:23:26 BLUE2 Keepalived_vrrp[1326]: (AST3_192) removing VIPs.
Jun 26 12:23:26 BLUE2 Keepalived_vrrp[1326]: VRRP sockpool: [ifindex(2), family(IPv4), proto(112), unicast(1), fd(12,13)]
Jun 26 12:23:26 BLUE2 Keepalived_vrrp[1326]: VRRP_Script(chk_Applicatie) succeeded
Jun 26 12:23:26 BLUE2 Keepalived_vrrp[1326]: (AST3_192) Entering BACKUP STATE
Jun 26 12:23:26 BLUE2 Keepalived_vrrp[1326]: AST3_192: sending gratuitous ARP for 10.20.139.16
Jun 26 12:23:26 BLUE2 Keepalived_vrrp[1326]: Sending gratuitous ARP on ens192 for 10.20.139.16
Jun 26 12:23:29 BLUE2 Keepalived_vrrp[1326]: (AST3_192) Receive advertisement timeout
Jun 26 12:23:29 BLUE2 Keepalived_vrrp[1326]: (AST3_192) Entering MASTER STATE
Jun 26 12:23:29 BLUE2 Keepalived_vrrp[1326]: (AST3_192) setting VIPs.
Jun 26 12:23:29 BLUE2 Keepalived_vrrp[1326]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 12:23:29 BLUE2 Keepalived_vrrp[1326]: (AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 26 12:23:29 BLUE2 Keepalived_vrrp[1326]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 12:23:29 BLUE2 Keepalived_vrrp[1326]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 12:23:29 BLUE2 Keepalived_vrrp[1326]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 12:23:29 BLUE2 Keepalived_vrrp[1326]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 12:23:29 BLUE2 Keepalived_vrrp[1326]: (AST3_192) Master received advert from 10.20.139.19 with same priority 100 but higher IP address than ours
Jun 26 12:23:29 BLUE2 Keepalived_vrrp[1326]: (AST3_192) Entering BACKUP STATE
Jun 26 12:23:29 BLUE2 Keepalived_vrrp[1326]: (AST3_192) removing VIPs.

Still barging in just loosing the battle.

noci2012 commented 4 years ago

And even after vrrp_startup deleay of 30 seconds extra: ...

Jun 26 12:33:34 BLUE3 Keepalived_vrrp[1338]: Registering Kernel netlink reflector
Jun 26 12:33:34 BLUE3 Keepalived_vrrp[1338]: Registering Kernel netlink command channel
Jun 26 12:33:34 BLUE3 Keepalived_vrrp[1338]: Opening file '/etc/keepalived/keepalived.conf'.
Jun 26 12:33:34 BLUE3 Keepalived_vrrp[1338]: Assigned address 10.20.139.19 for interface ens192
Jun 26 12:33:34 BLUE3 Keepalived_vrrp[1338]: Assigned address fe80::250:56ff:fe9d:75b8 for interface ens192
Jun 26 12:33:34 BLUE3 Keepalived_vrrp[1338]: Registering gratuitous ARP shared channel
Jun 26 12:33:34 BLUE3 Keepalived_vrrp[1338]: (AST3_192) removing VIPs.
Jun 26 12:33:34 BLUE3 Keepalived_vrrp[1338]: Delaying startup for 30 seconds

Logged on using ssh and ran tail -fn 100 /var/log/messages | grep Keepa showing the above and after a short while the following appears:....

Jun 26 12:34:04 BLUE3 Keepalived_vrrp[1338]: VRRP sockpool: [ifindex(2), family(IPv4), proto(112), unicast(1), fd(12,13)]
Jun 26 12:34:04 BLUE3 Keepalived_vrrp[1338]: VRRP_Script(chk_Applicatie) succeeded
Jun 26 12:34:04 BLUE3 Keepalived_vrrp[1338]: (AST3_192) Entering BACKUP STATE
Jun 26 12:34:04 BLUE3 Keepalived_vrrp[1338]: AST3_192: sending gratuitous ARP for 10.20.139.19
Jun 26 12:34:04 BLUE3 Keepalived_vrrp[1338]: Sending gratuitous ARP on ens192 for 10.20.139.19
Jun 26 12:34:07 BLUE3 Keepalived_vrrp[1338]: (AST3_192) Receive advertisement timeout
Jun 26 12:34:07 BLUE3 Keepalived_vrrp[1338]: (AST3_192) Entering MASTER STATE
Jun 26 12:34:07 BLUE3 Keepalived_vrrp[1338]: (AST3_192) setting VIPs.
Jun 26 12:34:07 BLUE3 Keepalived_vrrp[1338]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 12:34:07 BLUE3 Keepalived_vrrp[1338]: (AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 26 12:34:07 BLUE3 Keepalived_vrrp[1338]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 12:34:07 BLUE3 Keepalived_vrrp[1338]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 12:34:07 BLUE3 Keepalived_vrrp[1338]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 12:34:07 BLUE3 Keepalived_vrrp[1338]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 12:34:12 BLUE3 Keepalived_vrrp[1338]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 12:34:12 BLUE3 Keepalived_vrrp[1338]: (AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 26 12:34:12 BLUE3 Keepalived_vrrp[1338]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 12:34:12 BLUE3 Keepalived_vrrp[1338]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 12:34:12 BLUE3 Keepalived_vrrp[1338]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 12:34:12 BLUE3 Keepalived_vrrp[1338]: Sending gratuitous ARP on ens192 for 10.20.139.3
noci2012 commented 4 years ago

here is a complete session: BLUE2 is master: (vrrp_startup_delay 60 to allow ssh in and start a tshark for packet capture)

Jun 26 12:34:12 BLUE3 Keepalived_vrrp[1338]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 13:18:15 BLUE3 Keepalived[1307]: Starting Keepalived v2.0.20 (01/22,2020)
Jun 26 13:18:15 BLUE3 Keepalived[1307]: Running on Linux 3.10.0-1062.12.1.el7.x86_64 #1 SMP Thu Dec 12 06:44:49 EST 2019 (built for Linux 3.10.0)
Jun 26 13:18:15 BLUE3 Keepalived[1307]: Command line: '/usr/sbin/keepalived' '-D' '-i' 'BLUE3'
Jun 26 13:18:15 BLUE3 Keepalived[1307]: Opening file '/etc/keepalived/keepalived.conf'.
Jun 26 13:18:15 BLUE3 Keepalived[1307]: The vrrp_startup_delay is very large - 60 seconds
Jun 26 13:18:15 BLUE3 Keepalived[1327]: Starting VRRP child process, pid=1335
Jun 26 13:18:15 BLUE3 Keepalived_vrrp[1335]: Registering Kernel netlink reflector
Jun 26 13:18:15 BLUE3 Keepalived_vrrp[1335]: Registering Kernel netlink command channel
Jun 26 13:18:15 BLUE3 Keepalived_vrrp[1335]: Opening file '/etc/keepalived/keepalived.conf'.
Jun 26 13:18:15 BLUE3 Keepalived_vrrp[1335]: Assigned address 10.20.139.19 for interface ens192
Jun 26 13:18:15 BLUE3 Keepalived_vrrp[1335]: Assigned address fe80::250:56ff:fe9d:75b8 for interface ens192
Jun 26 13:18:15 BLUE3 Keepalived_vrrp[1335]: Registering gratuitous ARP shared channel
Jun 26 13:18:15 BLUE3 Keepalived_vrrp[1335]: (AST3_192) removing VIPs.
Jun 26 13:18:15 BLUE3 Keepalived_vrrp[1335]: Delaying startup for 60 seconds

[root@BLUE3 ~]# tshark -ni ens192 proto 112
Running as user "root" and group "root". This could be dangerous.
Capturing on 'ens192'
  1 0.000000000 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
  2 1.000157542 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
  3 2.000430305 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
  4 3.000568670 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
  5 4.000826382 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
  6 5.001038559 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
  7 6.001248666 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
  8 7.001563155 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
  9 8.001750976 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
 10 9.001994120 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
 11 10.002101786 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
 12 11.002276183 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
 13 12.002559076 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
 14 13.002678003 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
 15 14.002873583 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
 16 15.003200272 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
 17 16.003407614 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
 18 17.003399502 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
 19 18.003613443 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
 20 19.003831031 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
 21 20.003899291 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
 22 21.004060639 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
 23 22.004375672 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
Jun 26 13:19:15 BLUE3 Keepalived_vrrp[1335]: VRRP sockpool: [ifindex(2), family(IPv4), proto(112), unicast(1), fd(12,13)]
Jun 26 13:19:15 BLUE3 Keepalived_vrrp[1335]: VRRP_Script(chk_Applicatie) succeeded
Jun 26 13:19:15 BLUE3 Keepalived_vrrp[1335]: (AST3_192) Entering BACKUP STATE
Jun 26 13:19:15 BLUE3 Keepalived_vrrp[1335]: AST3_192: sending gratuitous ARP for 10.20.139.19
Jun 26 13:19:15 BLUE3 Keepalived_vrrp[1335]: Sending gratuitous ARP on ens192 for 10.20.139.19
 24 23.004492900 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
 25 24.004675748 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
 26 25.004882899 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
 27 26.005094966 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
Jun 26 13:19:19 BLUE3 Keepalived_vrrp[1335]: (AST3_192) Receive advertisement timeout
Jun 26 13:19:19 BLUE3 Keepalived_vrrp[1335]: (AST3_192) Entering MASTER STATE
Jun 26 13:19:19 BLUE3 Keepalived_vrrp[1335]: (AST3_192) setting VIPs.
Jun 26 13:19:19 BLUE3 Keepalived_vrrp[1335]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 13:19:19 BLUE3 Keepalived_vrrp[1335]: (AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 26 13:19:19 BLUE3 Keepalived_vrrp[1335]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 13:19:19 BLUE3 Keepalived_vrrp[1335]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 13:19:19 BLUE3 Keepalived_vrrp[1335]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 13:19:19 BLUE3 Keepalived_vrrp[1335]: Sending gratuitous ARP on ens192 for 10.20.139.3
 28 27.005411634 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
 29 27.047712762 10.20.139.19 -> 10.20.139.16 VRRP 54 Announcement (v2)
 30 27.050796171 10.20.139.19 -> 10.20.139.13 VRRP 54 Announcement (v2)
 31 28.047866305 10.20.139.19 -> 10.20.139.13 VRRP 54 Announcement (v2)
 32 28.047890602 10.20.139.19 -> 10.20.139.16 VRRP 54 Announcement (v2)
 33 29.048085296 10.20.139.19 -> 10.20.139.13 VRRP 54 Announcement (v2)
 34 29.048118560 10.20.139.19 -> 10.20.139.16 VRRP 54 Announcement (v2)
 35 30.048216810 10.20.139.19 -> 10.20.139.13 VRRP 54 Announcement (v2)
 36 30.048233241 10.20.139.19 -> 10.20.139.16 VRRP 54 Announcement (v2)
 37 31.048360722 10.20.139.19 -> 10.20.139.13 VRRP 54 Announcement (v2)
 38 31.048378579 10.20.139.19 -> 10.20.139.16 VRRP 54 Announcement (v2)
Jun 26 13:19:24 BLUE3 Keepalived_vrrp[1335]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 13:19:24 BLUE3 Keepalived_vrrp[1335]: (AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 26 13:19:24 BLUE3 Keepalived_vrrp[1335]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 13:19:24 BLUE3 Keepalived_vrrp[1335]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 13:19:24 BLUE3 Keepalived_vrrp[1335]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 13:19:24 BLUE3 Keepalived_vrrp[1335]: Sending gratuitous ARP on ens192 for 10.20.139.3
 39 32.048447894 10.20.139.19 -> 10.20.139.13 VRRP 54 Announcement (v2)
 40 32.048460858 10.20.139.19 -> 10.20.139.16 VRRP 54 Announcement (v2)
 41 33.048584155 10.20.139.19 -> 10.20.139.13 VRRP 54 Announcement (v2)
 42 33.048610678 10.20.139.19 -> 10.20.139.16 VRRP 54 Announcement (v2)
noci2012 commented 4 years ago

Here with keepalive & poc_bsp both disabled.... during boot, and manually started after boot. tshark shows traffic is incomming from BLUE2 (.16)

systemctl start keepalived.service 
Jun 26 14:47:27 BLUE3 Keepalived[1929]: Starting Keepalived v2.0.20 (01/22,2020)
Jun 26 14:47:27 BLUE3 Keepalived[1929]: Running on Linux 3.10.0-1062.12.1.el7.x86_64 #1 SMP Thu Dec 12 06:44:49 EST 2019 (built for Linux 3.10.0)
Jun 26 14:47:27 BLUE3 Keepalived[1929]: Command line: '/usr/sbin/keepalived' '-D' '-i' 'BLUE3'
Jun 26 14:47:27 BLUE3 Keepalived[1929]: Opening file '/etc/keepalived/keepalived.conf'.
Jun 26 14:47:27 BLUE3 Keepalived[1929]: The vrrp_startup_delay is very large - 60 seconds
Jun 26 14:47:27 BLUE3 Keepalived[1930]: Starting VRRP child process, pid=1931
Jun 26 14:47:27 BLUE3 Keepalived_vrrp[1931]: Registering Kernel netlink reflector
Jun 26 14:47:27 BLUE3 Keepalived_vrrp[1931]: Registering Kernel netlink command channel
Jun 26 14:47:27 BLUE3 Keepalived_vrrp[1931]: Opening file '/etc/keepalived/keepalived.conf'.
Jun 26 14:47:27 BLUE3 Keepalived_vrrp[1931]: Assigned address 10.20.139.19 for interface ens192
[root@BLUE3 ~]# Jun 26 14:47:27 BLUE3 Keepalived_vrrp[1931]: Assigned address fe80::250:56ff:fe9d:75b8 for interface ens192
Jun 26 14:47:27 BLUE3 Keepalived_vrrp[1931]: Registering gratuitous ARP shared channel
Jun 26 14:47:27 BLUE3 Keepalived_vrrp[1931]: (AST3_192) removing VIPs.
Jun 26 14:47:27 BLUE3 Keepalived_vrrp[1931]: Delaying startup for 60 seconds
tshark -ni ens192 proto 112
Running as user "root" and group "root". This could be dangerous.
Capturing on 'ens192'
  1 0.000000000 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
  2 1.000183526 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
  3 2.000259960 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
  4 3.000436365 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
  5 4.000550424 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
^C  6 5.000745909 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
6 packets captured
[root@BLUE3 ~]# Jun 26 14:48:27 BLUE3 Keepalived_vrrp[1931]: VRRP sockpool: [ifindex(2), family(IPv4), proto(112), unicast(1), fd(12,13)]
Jun 26 14:48:27 BLUE3 Keepalived_vrrp[1931]: Script `chk_Applicatie` now returning 1
Jun 26 14:48:27 BLUE3 Keepalived_vrrp[1931]: VRRP_Script(chk_Applicatie) failed (exited with status 1)
Jun 26 14:48:27 BLUE3 Keepalived_vrrp[1931]: (AST3_192) Entering FAULT STATE
systemctl start poc_bsp
[root@BLUE3 ~]# Jun 26 14:48:37 BLUE3 Keepalived_vrrp[1931]: Script `chk_Applicatie` now returning 0
Jun 26 14:48:38 BLUE3 Keepalived_vrrp[1931]: VRRP_Script(chk_Applicatie) succeeded
Jun 26 14:48:38 BLUE3 Keepalived_vrrp[1931]: (AST3_192) Entering BACKUP STATE
Jun 26 14:48:38 BLUE3 Keepalived_vrrp[1931]: AST3_192: sending gratuitous ARP for 10.20.139.19
Jun 26 14:48:38 BLUE3 Keepalived_vrrp[1931]: Sending gratuitous ARP on ens192 for 10.20.139.19
Jun 26 14:48:42 BLUE3 Keepalived_vrrp[1931]: (AST3_192) Receive advertisement timeout
Jun 26 14:48:42 BLUE3 Keepalived_vrrp[1931]: (AST3_192) Entering MASTER STATE
Jun 26 14:48:42 BLUE3 Keepalived_vrrp[1931]: (AST3_192) setting VIPs.
Jun 26 14:48:42 BLUE3 Keepalived_vrrp[1931]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:48:42 BLUE3 Keepalived_vrrp[1931]: (AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 26 14:48:42 BLUE3 Keepalived_vrrp[1931]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:48:42 BLUE3 Keepalived_vrrp[1931]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:48:42 BLUE3 Keepalived_vrrp[1931]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:48:42 BLUE3 Keepalived_vrrp[1931]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:48:45 BLUE3 Keepalived_vrrp[1931]: (AST3_192) Received advert from 10.20.139.13 with lower priority 100, ours 100, forcing new election
Jun 26 14:48:45 BLUE3 Keepalived_vrrp[1931]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:48:45 BLUE3 Keepalived_vrrp[1931]: (AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 26 14:48:45 BLUE3 Keepalived_vrrp[1931]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:48:45 BLUE3 Keepalived_vrrp[1931]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:48:45 BLUE3 Keepalived_vrrp[1931]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:48:45 BLUE3 Keepalived_vrrp[1931]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:48:47 BLUE3 Keepalived_vrrp[1931]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:48:47 BLUE3 Keepalived_vrrp[1931]: (AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 26 14:48:47 BLUE3 Keepalived_vrrp[1931]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:48:47 BLUE3 Keepalived_vrrp[1931]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:48:47 BLUE3 Keepalived_vrrp[1931]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:48:47 BLUE3 Keepalived_vrrp[1931]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:48:50 BLUE3 Keepalived_vrrp[1931]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:48:50 BLUE3 Keepalived_vrrp[1931]: (AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 26 14:48:50 BLUE3 Keepalived_vrrp[1931]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:48:50 BLUE3 Keepalived_vrrp[1931]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:48:50 BLUE3 Keepalived_vrrp[1931]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:48:50 BLUE3 Keepalived_vrrp[1931]: Sending gratuitous ARP on ens192 for 10.20.139.3
noci2012 commented 4 years ago

reboot, manual start keep alived + poc_bsp.... no intermediate fault state.

[root@BLUE3 ~]# systemctl start poc_bsp                       
[root@BLUE3 ~]# systemctl start keepalived.service 
Jun 26 14:52:31 BLUE3 Keepalived[1922]: Starting Keepalived v2.0.20 (01/22,2020)
Jun 26 14:52:31 BLUE3 Keepalived[1922]: Running on Linux 3.10.0-1062.12.1.el7.x86_64 #1 SMP Thu Dec 12 06:44:49 EST 2019 (built for Linux 3.10.0)
Jun 26 14:52:31 BLUE3 Keepalived[1922]: Command line: '/usr/sbin/keepalived' '-D' '-i' 'BLUE3'
Jun 26 14:52:31 BLUE3 Keepalived[1922]: Opening file '/etc/keepalived/keepalived.conf'.
Jun 26 14:52:31 BLUE3 Keepalived[1922]: The vrrp_startup_delay is very large - 60 seconds
Jun 26 14:52:31 BLUE3 Keepalived[1923]: Starting VRRP child process, pid=1924
[root@BLUE3 ~]# Jun 26 14:52:31 BLUE3 Keepalived_vrrp[1924]: Registering Kernel netlink reflector
Jun 26 14:52:31 BLUE3 Keepalived_vrrp[1924]: Registering Kernel netlink command channel
Jun 26 14:52:31 BLUE3 Keepalived_vrrp[1924]: Opening file '/etc/keepalived/keepalived.conf'.
Jun 26 14:52:31 BLUE3 Keepalived_vrrp[1924]: Assigned address 10.20.139.19 for interface ens192
Jun 26 14:52:31 BLUE3 Keepalived_vrrp[1924]: Assigned address fe80::250:56ff:fe9d:75b8 for interface ens192
Jun 26 14:52:31 BLUE3 Keepalived_vrrp[1924]: Registering gratuitous ARP shared channel
Jun 26 14:52:31 BLUE3 Keepalived_vrrp[1924]: (AST3_192) removing VIPs.
Jun 26 14:52:31 BLUE3 Keepalived_vrrp[1924]: Delaying startup for 60 seconds
systemctl start tshark -ni ens192 proto 112
Running as user "root" and group "root". This could be dangerous.
Capturing on 'ens192'
  1 0.000000000 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
  2 1.000161114 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
  3 2.000275412 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
  4 3.000580022 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
  5 4.000734252 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
^C  6 5.000877170 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
6 packets captured
[root@BLUE3 ~]# Jun 26 14:53:31 BLUE3 Keepalived_vrrp[1924]: VRRP sockpool: [ifindex(2), family(IPv4), proto(112), unicast(1), fd(12,13)]
Jun 26 14:53:31 BLUE3 Keepalived_vrrp[1924]: VRRP_Script(chk_Applicatie) succeeded
Jun 26 14:53:31 BLUE3 Keepalived_vrrp[1924]: (AST3_192) Entering BACKUP STATE
Jun 26 14:53:31 BLUE3 Keepalived_vrrp[1924]: AST3_192: sending gratuitous ARP for 10.20.139.19
Jun 26 14:53:31 BLUE3 Keepalived_vrrp[1924]: Sending gratuitous ARP on ens192 for 10.20.139.19
Jun 26 14:53:35 BLUE3 Keepalived_vrrp[1924]: (AST3_192) Receive advertisement timeout
Jun 26 14:53:35 BLUE3 Keepalived_vrrp[1924]: (AST3_192) Entering MASTER STATE
Jun 26 14:53:35 BLUE3 Keepalived_vrrp[1924]: (AST3_192) setting VIPs.
Jun 26 14:53:35 BLUE3 Keepalived_vrrp[1924]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:53:35 BLUE3 Keepalived_vrrp[1924]: (AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 26 14:53:35 BLUE3 Keepalived_vrrp[1924]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:53:35 BLUE3 Keepalived_vrrp[1924]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:53:35 BLUE3 Keepalived_vrrp[1924]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:53:35 BLUE3 Keepalived_vrrp[1924]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:53:40 BLUE3 Keepalived_vrrp[1924]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:53:40 BLUE3 Keepalived_vrrp[1924]: (AST3_192) Sending/queueing gratuitous ARPs on ens192 for 10.20.139.3
Jun 26 14:53:40 BLUE3 Keepalived_vrrp[1924]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:53:40 BLUE3 Keepalived_vrrp[1924]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:53:40 BLUE3 Keepalived_vrrp[1924]: Sending gratuitous ARP on ens192 for 10.20.139.3
Jun 26 14:53:40 BLUE3 Keepalived_vrrp[1924]: Sending gratuitous ARP on ens192 for 10.20.139.3
noci2012 commented 4 years ago

So reboot is not the immediate issue, it does seem that a 2nd startup after keepalived has been run DOES work as advertised: immediately after above test the following is done: stop keepalived & poc_bsp and restart..


[root@BLUE3 ~]# systemctl stop keepalived
Jun 26 14:55:36 BLUE3 Keepalived[1923]: Stopping
Jun 26 14:55:36 BLUE3 Keepalived_vrrp[1924]: (AST3_192) sent 0 priority
Jun 26 14:55:36 BLUE3 Keepalived_vrrp[1924]: (AST3_192) removing VIPs.
Jun 26 14:55:37 BLUE3 Keepalived_vrrp[1924]: Stopped - used 0.008802 user time, 0.080766 system time
Jun 26 14:55:37 BLUE3 Keepalived[1923]: Stopped Keepalived v2.0.20 (01/22,2020)
[root@BLUE3 ~]# systemctl stop poc_bsp
[root@BLUE3 ~]# 
[root@BLUE3 ~]# 
[root@BLUE3 ~]# systemctl start keepalived
Jun 26 14:55:58 BLUE3 Keepalived[2082]: Starting Keepalived v2.0.20 (01/22,2020)
Jun 26 14:55:58 BLUE3 Keepalived[2082]: Running on Linux 3.10.0-1062.12.1.el7.x86_64 #1 SMP Thu Dec 12 06:44:49 EST 2019 (built for Linux 3.10.0)
Jun 26 14:55:58 BLUE3 Keepalived[2082]: Command line: '/usr/sbin/keepalived' '-D' '-i' 'BLUE3'
Jun 26 14:55:58 BLUE3 Keepalived[2082]: Opening file '/etc/keepalived/keepalived.conf'.
Jun 26 14:55:58 BLUE3 Keepalived[2082]: The vrrp_startup_delay is very large - 60 seconds
Jun 26 14:55:58 BLUE3 Keepalived[2083]: Starting VRRP child process, pid=2084
Jun 26 14:55:58 BLUE3 Keepalived_vrrp[2084]: Registering Kernel netlink reflector
Jun 26 14:55:58 BLUE3 Keepalived_vrrp[2084]: Registering Kernel netlink command channel
Jun 26 14:55:58 BLUE3 Keepalived_vrrp[2084]: Opening file '/etc/keepalived/keepalived.conf'.
Jun 26 14:55:58 BLUE3 Keepalived_vrrp[2084]: Assigned address 10.20.139.19 for interface ens192
Jun 26 14:55:58 BLUE3 Keepalived_vrrp[2084]: Assigned address fe80::250:56ff:fe9d:75b8 for interface ens192
Jun 26 14:55:58 BLUE3 Keepalived_vrrp[2084]: Registering gratuitous ARP shared channel
Jun 26 14:55:58 BLUE3 Keepalived_vrrp[2084]: (AST3_192) removing VIPs.
[root@BLUE3 ~]# Jun 26 14:55:58 BLUE3 Keepalived_vrrp[2084]: Delaying startup for 60 seconds
systemctl stop ksystemctl start poc_bsp
[root@BLUE3 ~]# tshark -ni ens192 proto 112
Running as user "root" and group "root". This could be dangerous.
Capturing on 'ens192'
  1 0.000000000 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
  2 1.000099017 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
  3 2.000273840 10.20.139.16 -> 10.20.139.19 VRRP 60 Announcement (v2)
^C3 packets captured
[root@BLUE3 ~]# Jun 26 14:56:58 BLUE3 Keepalived_vrrp[2084]: VRRP sockpool: [ifindex(2), family(IPv4), proto(112), unicast(1), fd(12,13)]
Jun 26 14:56:58 BLUE3 Keepalived_vrrp[2084]: VRRP_Script(chk_Applicatie) succeeded
Jun 26 14:56:58 BLUE3 Keepalived_vrrp[2084]: (AST3_192) Entering BACKUP STATE
Jun 26 14:56:58 BLUE3 Keepalived_vrrp[2084]: AST3_192: sending gratuitous ARP for 10.20.139.19
Jun 26 14:56:58 BLUE3 Keepalived_vrrp[2084]: Sending gratuitous ARP on ens192 for 10.20.139.19

This doesn't fall through.

pqarmitage commented 4 years ago

What is poc_bsp? The one time keepalived has worked correctly is when keepalived was started when poc_bsp was stopped. Can you try a reboot with the poc_bsp service disabled and see what happens then.

It is clear from the logs that when AST3_192 transitions to master after keepalived starts that it keepalived has not received anything, or at least not received any advert that it considers valid. I will do an inspection of the code later to see if keepalived can silently discard received adverts, but I think, especially with the '-D' option, that all "bad" adverts are logged.

I note that since you have added the unicast_src_ip keepalived is sending gratuitious ARP messages for the src_ip address (I think you are right to have added the unicast_src_ip entries, but I will have a look at why the gratuitous ARPs are sent).

It appears to be Blue3 that is sending that 60 byte adverts, and Blue2 is sending 56 byte adverts. What CentOS versions are they each, and what version of keepalived are they each running? It would also be helpful to know the situation for Blue1.

noci2012 commented 4 years ago

The software package is called poc_bsp, it provides (a.o.) the program Applicatie. the systemd service file is also called poc_bsp.service. poc_bsp is a lightweight stub that "behaves" like the later application w.r.t. failover connections to backends, databases, ems, without having the actual application. PoC = Proof of Concept BSP is a platform design based on VMWare, Linux, etc. to provide a fault tolerant environment. If poc_bsp is stopped, Keepalived will get stuck in FAULT mode, the name of the application is the "Applicatie" that is in the vrrp_check if keepalived stops Applicatie will see the STOP message and also stop. So only when poc_bsp / Applicatie gets started keepalived CAN negotiate to become master. If Applicatie doesn't run the system is not a viable candidate for processing. One of the above test ( https://github.com/acassen/keepalived/issues/1652#issuecomment-650161626 ) does exactly that only start keepalived... starting poc_bsp after the vrrp_startup_delay expires.

Blue1 , blue2 AND blue3 are SENDING 54 byte adverts, on RECEPTION the packets are 60 bytes. (there is a 6 byte padding of 00..., (IMNSHO 64 bytes is the minimal ethernet framesize, and VLAN id's take 4 bytes, so this might actually be the minimal size padded by the sending hardware) (There are other systems green1,2 3 for testing upgrade scenarios).

You can see from the tshark traces adverts ARE received by the system, keepalived doesn't observe them... as it tells, I specificaly setup vrrp_start_delay to 60 seconds to be able to logon with ssh on the system AND show packets are received. wrt. to the not observing packets, is it possible the packet filter in keepalived misses a few after getting the filter setup...,

Indeed after i had to insert the @^ line on your advise, i could as well add @ entries for the unicase_src_ip's Problem with this is that i now also have to generate a few hundred of unique /etc/sysconfig/keepalived files. instead of one generic one. as systemd doesn't run it as a shell ...." ... -i $( hostname ) " doesn't work. Maybe an addition to allow for the commandline option -i HOSTNAME where the special name HOSTNAME (as litteral) means to get the actual hostname of the system? or a new commandline option. -H for this.

Another observation i had was the sending to the FIFO of a GROUP MASTER notification to the FIFO BEFORE the INSTANCE record that all showed BACKUP, then followed by a BACKUP group record. This doesn't provide a clean interface to the application trying to follow those messages. PREMATURE announcement of BEING a MASTER.. (now we dropped groups due to the testing environment being not valid as there is only one LAN).

Next week we are going to test other aspects of the software i may be able to sneak some test later on.

pqarmitage commented 4 years ago

I will have a further look at what is happening, but in the mean time you state that you can't use -i $(hostname) and suggest -i HOSTNAME or -H.

From keepalived.conf(5) man page:

Conditional configuration and configuration id
The config-id defaults to the first part of the node name as returned by
uname, and can be overridden with the -i or --config-id command line option.

Any configuration line starting with '@' is a conditional configuration line.
The word immediately following (i.e. without any space) the '@' character
is compared against the config-id, and if they don't match, the configuration
line is ignored.

So I think keepalived already does what you want, and is the default if -i is not specified.

noci2012 commented 4 years ago

Ok, i missed that part, sorry. then consider that part Done.. ;-) (i used other sources on this, i shoulld also have read the man page).

noci2012 commented 4 years ago

For your information. The test environment has been erased and is not available anymore. Keepalived failed this stability test to not become a primary if another node (with lower IP adress) is primary. As such another method is selected to provide Primary/Secondary election functionality.

That said there are multiple mentions of this issue with keepalived mostly dismissed with cannot happen. Still is does happen and was reproducible.

pqarmitage commented 4 years ago

I can reproduce this problem using v1.4.0, but the problem does not exist in v2.0.10, as confirmed at the start of comment https://github.com/acassen/keepalived/issues/1652#issuecomment-649819130.

Whereas the problem did exist in v1.4.0, it has now been resolved, so this is no longer an issue.

The solution is to upgrade to a recent version, e.g. v2.1.5.