acassen / keepalived

Keepalived
https://www.keepalived.org
GNU General Public License v2.0
4.02k stars 734 forks source link

track_file - recovery from fault with the old priority, not the current #1460

Closed dpajin closed 5 years ago

dpajin commented 5 years ago

Describe the bug track file feature - when the priority is decreased and then further decreased to FAULT condition (with -254), the previous priority value is saved. When the track file returns some other value than it was before, the state gets back from FAULT, but with the old priority value, not the current value from the track_file

To Reproduce Any steps necessary to reproduce the behaviour:

vrrp_track_file testing_track_file {
    file "/etc/keepalived/vrrp/testing_track_file"
    weight -1
    init_file 0
}

vrrp_track_process dockerd {
    process dockerd
    delay 5
}

vrrp_track_process snmpd {
    process snmpd
    delay 5
    weight -10
}

vrrp_instance testing {
    interface bond3
    virtual_router_id 3
    priority 254
    nopreempt
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass testing
    }
    virtual_ipaddress {
        10.197.20.10
    }
    track_interface {
      eth0 weight -10
      bond0
      eth1 weight -10
      eth2 weight -10
      bond3
      eth3 weight -10
      eth4 weight -10
      bond4
    }
    track_process {
        dockerd
        snmpd
    }
    track_file {
        testing_track_file weight -1
    }
    track_script {
        testing_check
    }
}

# echo "0" > /etc/keepalived/vrrp/testing_track_file

Dec  2 11:22:31 node1 Keepalived[3722]: Starting Keepalived v2.0.19 (10/19,2019)
Dec  2 11:22:31 node1 Keepalived[3722]: WARNING - keepalived was build for newer Linux 4.4.197, running on Linux 4.4.0-169-generic #198-Ubuntu SMP Tue Nov 12 10:38:00 UTC 2019
Dec  2 11:22:31 node1 Keepalived[3722]: Command line: '/usr/local/sbin/keepalived' '--log-detail' '--vrrp' '--snmp'
Dec  2 11:22:31 node1 Keepalived[3722]: Opening file '/etc/keepalived/keepalived.conf'.
Dec  2 11:22:31 node1 Keepalived[3724]: Starting VRRP child process, pid=3725
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: Registering Kernel netlink reflector
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: Registering Kernel netlink command channel
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: Opening file '/etc/keepalived/keepalived.conf'.
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: WARNING - default user 'keepalived_script' for script execution does not exist - please create.
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: Starting SNMP subagent
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: NET-SNMP version 5.7.3 AgentX subagent connected
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: SECURITY VIOLATION - scripts are being executed but script_security not enabled.
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: (testing) Ignoring track_interface bond3 since own interface
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: Assigned address 10.197.20.1 for interface bond3
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: Assigned address fe80::5468:a3ff:fe54:3a98 for interface bond3
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: Registering gratuitous ARP shared channel
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: (testing) removing VIPs.
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: VRRP sockpool: [ifindex(865), family(IPv4), proto(112), unicast(0), fd(17,18)]
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: VRRP_Script(testing_check) succeeded
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: (testing) Entering BACKUP STATE

# echo "10" > /etc/keepalived/vrrp/testing_track_file
Dec  2 11:23:25 node1 Keepalived_vrrp[3725]: (testing) Changing effective priority from 254 to 244
# snmpwalk -v2c -c public localhost KEEPALIVED-MIB::vrrpInstanceEffectivePriority
KEEPALIVED-MIB::vrrpInstanceEffectivePriority.1 = INTEGER: 244

# echo "50" > /etc/keepalived/vrrp/testing_track_file
Dec  2 11:23:45 node1 Keepalived_vrrp[3725]: (testing) Changing effective priority from 244 to 204
KEEPALIVED-MIB::vrrpInstanceEffectivePriority.1 = INTEGER: 204

# echo "100" > /etc/keepalived/vrrp/testing_track_file
Dec  2 11:23:58 node1 Keepalived_vrrp[3725]: (testing) Changing effective priority from 204 to 154
KEEPALIVED-MIB::vrrpInstanceEffectivePriority.1 = INTEGER: 154

# echo "254" > /etc/keepalived/vrrp/testing_track_file
Dec  2 11:24:11 node1 Keepalived_vrrp[3725]: (testing): tracked file testing_track_file now FAULT state
Dec  2 11:24:11 node1 Keepalived_vrrp[3725]: (testing) Entering FAULT STATE
KEEPALIVED-MIB::vrrpInstanceEffectivePriority.1 = INTEGER: 154

# echo "50" > /etc/keepalived/vrrp/testing_track_file
Dec  2 11:24:30 node1 Keepalived_vrrp[3725]: (testing): tracked file testing_track_file leaving FAULT state
Dec  2 11:24:30 node1 Keepalived_vrrp[3725]: (testing) Entering BACKUP STATE
KEEPALIVED-MIB::vrrpInstanceEffectivePriority.1 = INTEGER: 154

# echo "10" > /etc/keepalived/vrrp/testing_track_file
Dec  2 11:24:43 node1 Keepalived_vrrp[3725]: (testing) Changing effective priority from 154 to 194
KEEPALIVED-MIB::vrrpInstanceEffectivePriority.1 = INTEGER: 194

Expected behavior A clear and concise description of what you expected to happen.

The expected is that when the "value" in the track_file is set to "value" > -254, the state will transition back to BACKUP or MASTER, but the priority value will be set to "original_prio" + "value" and not to the last last value before falling to FAULT state.

Keepalived version Output of keepalived -v

compiled on Ubuntu 16.04.6

root@node1:~# keepalived -v
Keepalived v2.0.19 (10/19,2019)

Copyright(C) 2001-2019 Alexandre Cassen, <acassen@gmail.com>

Built with kernel headers for Linux 4.4.197
Running on Linux 4.4.0-169-generic #198-Ubuntu SMP Tue Nov 12 10:38:00 UTC 2019

configure options: --enable-snmp

Config options:  LVS VRRP VRRP_AUTH OLD_CHKSUM_COMPAT FIB_ROUTING SNMP_VRRP SNMP_CHECKER

System options:  PIPE2 SIGNALFD INOTIFY_INIT1 VSYSLOG EPOLL_CREATE1 IPV4_DEVCONF IPV6_ADVANCED_API RTA_ENCAP RTA_NEWDST RTA_PREF FRA_SUPPRESS_PREFIXLEN FRA_SUPPRESS_IFGROUP FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK RTEXT_FILTER_SKIP_STATS RTA_VIA FRA_OIFNAME IFA_FLAGS IP_MULTICAST_ALL LWTUNNEL_ENCAP_MPLS LWTUNNEL_ENCAP_ILA NET_LINUX_IF_H_COLLISION LIBIPTC_LINUX_NET_IF_H_COLLISION IPVS_DEST_ATTR_ADDR_FAMILY IPVS_SYNCD_ATTRIBUTES IPVS_64BIT_STATS VRRP_VMAC VRRP_IPVLAN IFLA_LINK_NETNSID CN_PROC SOCK_NONBLOCK SOCK_CLOEXEC O_PATH GLOB_BRACE INET6_ADDR_GEN_MODE VRF SO_MARK SCHED_RT SCHED_RESET_ON_FORK

Distro (please complete the following information):

Ubuntu 16.04.6

root@node1:~# uname -a
Linux node1 4.4.0-169-generic #198-Ubuntu SMP Tue Nov 12 10:38:00 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Details of any containerisation or hosted service (e.g. AWS) If keepalived is being run in a container or on a hosted service, provide full details

running directly on Linux

Configuration file: A full copy of the configuration file, obfuscated if necessary to protect passwords and IP addresses

# Keepalived configuration for WIND testing

# Global definitions configuration block
global_defs {
    enable_snmp_vrrp
    enable_snmp_checker
}

vrrp_script testing_check {
    script       "/bin/bash /etc/keepalived/vrrp/check_testing.sh"
    interval 2   # check every 2 seconds
    timeout 20
    fall 2       # require 2 failures for KO
    rise 2       # require 2 successes for OK
}

vrrp_track_file testing_track_file {
    file "/etc/keepalived/vrrp/testing_track_file"
    weight -1
    init_file 0
}

vrrp_track_process dockerd {
    process dockerd
    delay 5
}

vrrp_track_process snmpd {
    process snmpd
    delay 5
    weight -10
}

vrrp_instance testing {
    interface bond3
    virtual_router_id 3
    priority 254
    nopreempt
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass testing
    }
    virtual_ipaddress {
        10.197.20.10
    }
    track_interface {
      eth0 weight -10
      bond0
      eth1 weight -10
      eth2 weight -10
      bond3
      eth3 weight -10
      eth4 weight -10
      bond4
    }
    track_process {
        dockerd
        snmpd
    }
    track_file {
        testing_track_file weight -1
    }
    track_script {
        testing_check
    }
    notify_master "/etc/keepalived/vrrp/notify_script.sh MASTER"
    notify_backup "/etc/keepalived/vrrp/notify_script.sh BACKUP"
    notify_fault "/etc/keepalived/vrrp/notify_script.sh FAULT"
    notify_stop "/etc/keepalived/vrrp/notify_script.sh STOPPED"
}

Notify and track scripts If any notify or track scripts are in use, please provide copies of them

System Log entries Full keepalived system log entries from when keepalived started

Dec  2 11:22:31 node1 Keepalived[3724]: Starting VRRP child process, pid=3725
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: Registering Kernel netlink reflector
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: Registering Kernel netlink command channel
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: Opening file '/etc/keepalived/keepalived.conf'.
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: WARNING - default user 'keepalived_script' for script execution does not exist - please create.
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: Starting SNMP subagent
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: NET-SNMP version 5.7.3 AgentX subagent connected
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: SECURITY VIOLATION - scripts are being executed but script_security not enabled.
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: (opennti) Ignoring track_interface bond3 since own interface
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: Assigned address 10.197.20.1 for interface bond3
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: Assigned address fe80::5468:a3ff:fe54:3a98 for interface bond3
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: Registering gratuitous ARP shared channel
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: (opennti) removing VIPs.
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: VRRP sockpool: [ifindex(865), family(IPv4), proto(112), unicast(0), fd(17,18)]
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: VRRP_Script(opennti_check) succeeded
Dec  2 11:22:31 node1 Keepalived_vrrp[3725]: (opennti) Entering BACKUP STATE
Dec  2 11:23:25 node1 Keepalived_vrrp[3725]: (opennti) Changing effective priority from 254 to 244
Dec  2 11:23:45 node1 Keepalived_vrrp[3725]: (opennti) Changing effective priority from 244 to 204
Dec  2 11:23:58 node1 Keepalived_vrrp[3725]: (opennti) Changing effective priority from 204 to 154
Dec  2 11:24:11 node1 Keepalived_vrrp[3725]: (opennti): tracked file opennti_track_file now FAULT state
Dec  2 11:24:11 node1 Keepalived_vrrp[3725]: (opennti) Entering FAULT STATE
Dec  2 11:24:30 node1 Keepalived_vrrp[3725]: (opennti): tracked file opennti_track_file leaving FAULT state
Dec  2 11:24:30 node1 Keepalived_vrrp[3725]: (opennti) Entering BACKUP STATE
Dec  2 11:24:43 node1 Keepalived_vrrp[3725]: (opennti) Changing effective priority from 154 to 194

Did keepalived coredump? If so, can you please provide a stacktrace from the coredump, using gdb.

Additional context Add any other context about the problem here.

pqarmitage commented 5 years ago

Many thanks for reporting this, along with the detailed information.

Commit 8731e4a resolves the issue.

dpajin commented 5 years ago

Thank you for maintaining this great project and quick reaction, really appreciated.