acassen / keepalived

Keepalived
https://www.keepalived.org
GNU General Public License v2.0
4k stars 736 forks source link

Segfault when reloading from a config without enable_snmp_vrrp to one with it #2086

Open araujorm opened 2 years ago

araujorm commented 2 years ago

Describe the bug Keepalived 2.2.7 (and at least some previous versions) segfaults when reloading from a config without enable_snmp_vrrp to one with it.

To Reproduce

  1. Ensure snmpd is running with masterx option
  2. Create any configuration with at least one interface and one VIP in a VRRP instance, without the option enable_snmp_vrrrp
  3. Start keepalived
  4. Change the configuration by adding enable_snmp_vrrrp
  5. Reload keepalived (kill -HUP)
  6. Check keepalived output (pid X exited due to segmentation fault (SIGSEGV)) Note that what dies is the VRRP child process; keepalived's main process reports that and starts it again, but the consequences aren't nice when it happens on a master node (causes the backup node to go to master, then eventually the cluster recovers but meanwhile there is some downtime)

Expected behavior No segfaults while reloading keepalived.

Keepalived version

Keepalived v2.2.7 (01/16,2022)

Copyright(C) 2001-2022 Alexandre Cassen, <acassen@gmail.com>

Built with kernel headers for Linux 4.18.0
Running on Linux 4.18.0-348.12.2.el8_5.x86_64 #1 SMP Wed Jan 19 17:53:40 UTC 2022
Distro: Rocky Linux 8.5 (Green Obsidian)

configure options: --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --enable-snmp --enable-snmp-rfc --enable-sha1 --with-init=systemd build_alias=x86_64-redhat-linux-gnu host_alias=x86_64-redhat-linux-gnu PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig CFLAGS=-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection LDFLAGS=-Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld

Config options:  LIBIPSET_DYNAMIC LVS VRRP VRRP_AUTH VRRP_VMAC OLD_CHKSUM_COMPAT SNMP_V3_FOR_V2 SNMP_VRRP SNMP_CHECKER SNMP_RFCV2 SNMP_RFCV3 INIT=systemd

System options:  VSYSLOG MEMFD_CREATE IPV4_DEVCONF LIBNL3 RTA_ENCAP RTA_EXPIRES RTA_NEWDST RTA_PREF FRA_SUPPRESS_PREFIXLEN FRA_SUPPRESS_IFGROUP FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK RTEXT_FILTER_SKIP_STATS FRA_L3MDEV FRA_UID_RANGE RTAX_FASTOPEN_NO_COOKIE RTA_VIA FRA_PROTOCOL FRA_IP_PROTO FRA_SPORT_RANGE FRA_DPORT_RANGE RTA_TTL_PROPAGATE IFA_FLAGS LWTUNNEL_ENCAP_MPLS LWTUNNEL_ENCAP_ILA IPTABLES NET_LINUX_IF_H_COLLISION LIBIPVS_NETLINK IPVS_DEST_ATTR_ADDR_FAMILY IPVS_SYNCD_ATTRIBUTES IPVS_64BIT_STATS VRRP_IPVLAN IFLA_LINK_NETNSID GLOB_BRACE GLOB_ALTDIRFUNC INET6_ADDR_GEN_MODE VRF SO_MARK

Distro (please complete the following information):

Details of any containerisation or hosted service (e.g. AWS) N/A

Configuration file:

global_defs {
    router_id testing

    # start with this commented, then uncomment and reload to see the segfault
    #enable_snmp_vrrp
}

# bond0
vrrp_instance hb_enp1s0 {
    # change to your interface
    interface enp1s0

    virtual_router_id 20
    advert_int 1
    authentication {
        auth_type AH
        auth_pass deadbeef
    }

    virtual_ipaddress {
        # change the interface as needed
        10.234.32.254/24 dev enp1s0 label enp1s0:254
    }
}

Notify and track scripts N/A

System Log entries

Tue Jan 25 14:46:14 2022: Starting Keepalived v2.2.7 (01/16,2022)
Tue Jan 25 14:46:14 2022: Running on Linux 4.18.0-348.12.2.el8_5.x86_64 #1 SMP Wed Jan 19 17:53:40 UTC 2022 (built for Linux 4.18.0)
Tue Jan 25 14:46:14 2022: Command line: 'keepalived' '-D' '-n' '-l'
Tue Jan 25 14:46:14 2022: Opening file '/etc/keepalived/keepalived.conf'.
(...)

When SIGHUP is sent after uncommenting enable_snmp_vrrp:

Tue Jan 25 14:46:44 2022: Reloading ...
Tue Jan 25 14:46:44 2022: Opening file '/etc/keepalived/keepalived.conf'.
Tue Jan 25 14:46:44 2022: Configuration file /etc/keepalived/keepalived.conf
Tue Jan 25 14:46:44 2022: Reloading
Tue Jan 25 14:46:44 2022: pid 6076 exited due to segmentation fault (SIGSEGV).
Tue Jan 25 14:46:44 2022:   Please report a bug at https://github.com/acassen/keepalived/issues
Tue Jan 25 14:46:44 2022:   and include this log from when keepalived started, a description
Tue Jan 25 14:46:44 2022:   of what happened before the crash, your configuration file and the details below.
Tue Jan 25 14:46:44 2022:   Also provide the output of keepalived -v, and whether keepalived is being
Tue Jan 25 14:46:44 2022:   run in a container or VM.
Tue Jan 25 14:46:44 2022:   A failure to provide all this information may mean the crash cannot be investigated.
Tue Jan 25 14:46:44 2022:   If you are able to provide a stack backtrace with gdb that would really help.
Tue Jan 25 14:46:44 2022:   Source version 2.2.7
Tue Jan 25 14:46:44 2022:   Built with kernel headers for Linux 4.18.0
Tue Jan 25 14:46:44 2022:   Running on Linux 4.18.0-348.12.2.el8_5.x86_64 #1 SMP Wed Jan 19 17:53:40 UTC 2022
Tue Jan 25 14:46:44 2022:   Command line: 'keepalived' '-D' '-n' '-l'
Tue Jan 25 14:46:44 2022:   configure options: --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix=
Tue Jan 25 14:46:44 2022:                      --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin
Tue Jan 25 14:46:44 2022:                      --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share
Tue Jan 25 14:46:44 2022:                      --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec
Tue Jan 25 14:46:44 2022:                      --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man
Tue Jan 25 14:46:44 2022:                      --infodir=/usr/share/info --enable-snmp --enable-snmp-rfc --enable-sha1
Tue Jan 25 14:46:44 2022:                      --with-init=systemd build_alias=x86_64-redhat-linux-gnu
Tue Jan 25 14:46:44 2022:                      host_alias=x86_64-redhat-linux-gnu
Tue Jan 25 14:46:44 2022:                      PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig CFLAGS=-O2 -g -pipe
Tue Jan 25 14:46:44 2022:                      -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS
Tue Jan 25 14:46:44 2022:                      -fexceptions -fstack-protector-strong -grecord-gcc-switches
Tue Jan 25 14:46:44 2022:                      -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1
Tue Jan 25 14:46:44 2022:                      -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic
Tue Jan 25 14:46:44 2022:                      -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection
Tue Jan 25 14:46:44 2022:                      LDFLAGS=-Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld
Tue Jan 25 14:46:44 2022:   Config options: LIBIPSET_DYNAMIC LVS VRRP VRRP_AUTH VRRP_VMAC OLD_CHKSUM_COMPAT SNMP_V3_FOR_V2
Tue Jan 25 14:46:44 2022:                   SNMP_VRRP SNMP_CHECKER SNMP_RFCV2 SNMP_RFCV3 INIT=systemd
Tue Jan 25 14:46:44 2022:   System options: VSYSLOG MEMFD_CREATE IPV4_DEVCONF LIBNL3 RTA_ENCAP RTA_EXPIRES RTA_NEWDST RTA_PREF
Tue Jan 25 14:46:44 2022:                   FRA_SUPPRESS_PREFIXLEN FRA_SUPPRESS_IFGROUP FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK
Tue Jan 25 14:46:44 2022:                   RTEXT_FILTER_SKIP_STATS FRA_L3MDEV FRA_UID_RANGE RTAX_FASTOPEN_NO_COOKIE RTA_VIA
Tue Jan 25 14:46:44 2022:                   FRA_PROTOCOL FRA_IP_PROTO FRA_SPORT_RANGE FRA_DPORT_RANGE RTA_TTL_PROPAGATE IFA_FLAGS
Tue Jan 25 14:46:44 2022:                   LWTUNNEL_ENCAP_MPLS LWTUNNEL_ENCAP_ILA IPTABLES NET_LINUX_IF_H_COLLISION
Tue Jan 25 14:46:44 2022:                   LIBIPVS_NETLINK IPVS_DEST_ATTR_ADDR_FAMILY IPVS_SYNCD_ATTRIBUTES IPVS_64BIT_STATS
Tue Jan 25 14:46:44 2022:                   VRRP_IPVLAN IFLA_LINK_NETNSID GLOB_BRACE GLOB_ALTDIRFUNC INET6_ADDR_GEN_MODE VRF
Tue Jan 25 14:46:44 2022:                   SO_MARK
Tue Jan 25 14:46:44 2022: VRRP child process(6076) died: Respawning
Tue Jan 25 14:46:44 2022:   Please log an issue at https://github.com/acassen/keepalived/issues/
Tue Jan 25 14:46:44 2022:   and include a full copy of your keepalived configuration files, and
Tue Jan 25 14:46:44 2022:   copies of the keepalived system log entries around the time this happened
Tue Jan 25 14:46:44 2022: Restart of VRRP process delayed 0 seconds to limit respawn rate
Tue Jan 25 14:46:44 2022: Starting VRRP child process, pid=6081
Tue Jan 25 14:46:44 2022: Registering Kernel netlink reflector
(...)

Did keepalived coredump? Yes, coredumpctl debug bt follows:

#0  timer_thread_update_timeout (thread_cp=0x0, timer=0) at scheduler.c:1359
#1  0x000055e12f476153 in snmp_epoll_info (m=0x55e13102fb40) at timer.h:130
#2  0x000055e12f43b0cc in start_vrrp (prev_global_data=0x55e1310257e0) at vrrp_daemon.c:551
#3  start_vrrp (prev_global_data=0x55e1310257e0) at vrrp_daemon.c:497
#4  0x000055e12f43b735 in reload_vrrp_thread (thread=<optimized out>) at vrrp_daemon.c:849
#5  0x000055e12f47669d in thread_call (thread=0x55e1310300c0) at scheduler.c:2081
#6  process_threads (m=0x55e13102fb40) at scheduler.c:2081
#7  0x000055e12f476f15 in launch_thread_scheduler (m=<optimized out>) at scheduler.c:2202
#8  0x000055e12f43bad4 in start_vrrp_child () at vrrp_daemon.c:1133
#9  start_vrrp_child () at vrrp_daemon.c:983
#10 0x000055e12f40eb25 in start_keepalived (thread=<optimized out>) at main.c:560
#11 0x000055e12f47669d in thread_call (thread=0x55e13102fac0) at scheduler.c:2081
#12 process_threads (m=0x55e13102dee0) at scheduler.c:2081
#13 0x000055e12f476f15 in launch_thread_scheduler (m=<optimized out>) at scheduler.c:2202
#14 0x000055e12f41182c in keepalived_main (argc=4, argv=<optimized out>) at main.c:2763
#15 0x00007f35a352a493 in __libc_start_main () from /lib64/libc.so.6
#16 0x000055e12f40e68e in _start ()

Additional context N/A

pqarmitage commented 2 years ago

I have been able to reproduce this segfault, and can merge a patch that will stop this particular problem. However, what I haven't been able to get working yet is SNMP enabled -> SNMP disabled -> SNMP enabled. The problem is that once SNMP has been enabled, when going from SNMP disabled to SNMP enabled, although net-snmp forwards the SNMP requests to keepalived, nothing seems to happen.

With the current code (plus patch to stop segfault), net-snmp complains, when re-enabling SNMP, that the mibs are already registered. I have added unregister_mib() calls in function snmp_unregister_mib(), but those calls fail with MIN_NO_SUCH_REGISTRATION (-1). I have also tried calling shutdown_agent() when SNMP is disabled, but then when SNMP is re-enabled, net-snmp doesn't seem to reinitialise properly.

If anyone has any ideas about what needs to be done, then any help would be gratefully received.