acassen / keepalived

Keepalived
https://www.keepalived.org
GNU General Public License v2.0
4.02k stars 734 forks source link

Segfault when interfaces disappear and reappear #1594

Closed araujorm closed 4 years ago

araujorm commented 4 years ago

Describe the bug Keepalived segfaults when interfaces disappear and reappear on some circumstances.

This may not seem likely to happen, but when using interfaces created by, for instance, openvswitch, it can happen a few times (mainly if you have to have the need to restart openvswitch). It can also happen with bonding, bridge, vlan, tun, wireguard and other types of interfaces that may be recreated dynamically in different scenarios.

To Reproduce Easiest way to reproduce is with dummy interfaces. Although they aren't used in a real world scenario, the behaviour is the same as what happens when the aforementioned interfaces appear and disappear.

Steps:

  1. Prepare your keepalived with the following minimal keepalived.conf sample file, but don't start it yet:
    
    global_defs {
    router_id CRASHINTS
    }

vrrp_instance vrrp1 { state BACKUP interface enp1s0 # <-- CHANGE HERE TO MEET YOUR FIRST INTERFACE virtual_router_id 20 advert_int 1 authentication { auth_type AH auth_pass somthing }

virtual_ipaddress { 172.28.29.254/24 dev enp1s0 label enp1s0:254 10.123.123.254/24 dev dummy0 label dummy0:254 } }


2. Create dummy0 interface and activate its link:

ip link add dummy0 type dummy; ip link set dummy0 up


3. Start keepalived and wait for it to become MASTER.

4. Simulate the interface disappearing and reapearing:

ip link delete dummy0; ip link add dummy0 type dummy; ip link set dummy0 up

At this point, keepalived goes from MASTER->FAULT->BACKUP->MASTER as it should.

5. Repeat previous step. You'll see that keepalived segfaults.

6. Alternatively, don't do step 5 but try stopping keepalived. You'll see that it also segfaults (in a different point of the code, although I'm not sure if fixing one will fix the other).

**Expected behavior**
Keepalived should not segfault.

**Keepalived version**
Verified on latest master (commit 82b15f1568e1d4aae4e5c5095e99eba28df187f1), but also verified on some previous versions. Output of `keepalived -v`:

Keepalived v2.0.20 (unknown)

Copyright(C) 2001-2020 Alexandre Cassen, acassen@gmail.com

Built with kernel headers for Linux 4.18.0 Running on Linux 4.18.0-147.8.1.el8_1.x86_64 #1 SMP Thu Apr 9 13:49:54 UTC 2020

configure options: --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --enable-snmp --enable-snmp-rfc --enable-sha1 --with-init=systemd build_alias=x86_64-redhat-linux-gnu host_alias=x86_64-redhat-linux-gnu PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig CFLAGS=-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection LDFLAGS=-Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld

Config options: LIBIPSET_DYNAMIC LVS VRRP VRRP_AUTH OLD_CHKSUM_COMPAT FIB_ROUTING SNMP_V3_FOR_V2 SNMP_VRRP SNMP_CHECKER SNMP_RFCV2 SNMP_RFCV3

System options: PIPE2 SIGNALFD INOTIFY_INIT1 VSYSLOG EPOLL_CREATE1 IPV4_DEVCONF IPV6_ADVANCED_API LIBNL3 RTA_ENCAP RTA_EXPIRES RTA_NEWDST RTA_PREF FRA_SUPPRESS_PREFIXLEN FRA_SUPPRESS_IFGROUP FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK RTEXT_FILTER_SKIP_STATS FRA_L3MDEV FRA_UID_RANGE RTAX_FASTOPEN_NO_COOKIE RTA_VIA FRA_OIFNAME FRA_PROTOCOL FRA_IP_PROTO FRA_SPORT_RANGE FRA_DPORT_RANGE RTA_TTL_PROPAGATE IFA_FLAGS IP_MULTICAST_ALL LWTUNNEL_ENCAP_MPLS LWTUNNEL_ENCAP_ILA IPTABLES NET_LINUX_IF_H_COLLISION LIBIPVS_NETLINK IPVS_DEST_ATTR_ADDR_FAMILY IPVS_SYNCD_ATTRIBUTES IPVS_64BIT_STATS VRRP_VMAC VRRP_IPVLAN IFLA_LINK_NETNSID CN_PROC SOCK_NONBLOCK SOCK_CLOEXEC O_PATH GLOB_BRACE INET6_ADDR_GEN_MODE VRF SO_MARK SCHED_RESET_ON_FORK


**Distro (please complete the following information):**
 - Name: CentOS
 - Version: 8
 - Architecture: x86_64

**Details of any containerisation or hosted service (e.g. AWS)**
N/A

**Configuration file:**
Please see above.

**Notify and track scripts**
Not used.

**System Log entries**

Fri May 29 13:37:39 2020: Running on Linux 4.18.0-147.8.1.el8_1.x86_64 #1 SMP Thu Apr 9 13:49:54 UTC 2020 (built for Linux 4.18.0) Fri May 29 13:37:39 2020: Command line: 'keepalived' '-D' '-n' '-l' Fri May 29 13:37:39 2020: Opening file '/etc/keepalived/keepalived.conf'. Fri May 29 13:37:39 2020: NOTICE: setting config option max_auto_priority should result in better keepalived performance Fri May 29 13:37:39 2020: Starting VRRP child process, pid=3190 Fri May 29 13:37:39 2020: Registering Kernel netlink reflector Fri May 29 13:37:39 2020: Registering Kernel netlink command channel Fri May 29 13:37:39 2020: Opening file '/etc/keepalived/keepalived.conf'. Fri May 29 13:37:39 2020: Assigned address 192.168.122.125 for interface enp1s0 Fri May 29 13:37:39 2020: Assigned address fe80::5c39:3778:f2a7:5f02 for interface enp1s0 Fri May 29 13:37:39 2020: Registering gratuitous ARP shared channel Fri May 29 13:37:39 2020: (vrrp1) removing VIPs. Fri May 29 13:37:39 2020: (vrrp1) Entering BACKUP STATE (init) Fri May 29 13:37:39 2020: VRRP sockpool: [ifindex(2), family(IPv4), proto(51), unicast(0), fd(11,12)] Fri May 29 13:37:43 2020: (vrrp1) Receive advertisement timeout Fri May 29 13:37:43 2020: (vrrp1) Entering MASTER STATE Fri May 29 13:37:43 2020: (vrrp1) setting VIPs. Fri May 29 13:37:43 2020: (vrrp1) Sending/queueing gratuitous ARPs on enp1s0 for 172.28.29.254 Fri May 29 13:37:43 2020: Sending gratuitous ARP on enp1s0 for 172.28.29.254 Fri May 29 13:37:43 2020: (vrrp1) Sending/queueing gratuitous ARPs on dummy0 for 10.123.123.254 Fri May 29 13:37:43 2020: Sending gratuitous ARP on enp1s0 for 172.28.29.254 Fri May 29 13:37:43 2020: Sending gratuitous ARP on enp1s0 for 172.28.29.254 Fri May 29 13:37:43 2020: Sending gratuitous ARP on enp1s0 for 172.28.29.254 Fri May 29 13:37:43 2020: Sending gratuitous ARP on enp1s0 for 172.28.29.254

If doing step 5, later on:

Fri May 29 13:39:04 2020: Netlink reports dummy0 down Fri May 29 13:39:04 2020: (vrrp1) Entering FAULT STATE Fri May 29 13:39:04 2020: (vrrp1) sent 0 priority Fri May 29 13:39:04 2020: (vrrp1) removing VIPs. Fri May 29 13:39:04 2020: Deassigned address fe80::5cc9:f8ff:fe58:1223 from interface dummy0 Fri May 29 13:39:04 2020: Interface dummy0 deleted Fri May 29 13:39:04 2020: pid 3190 exited due to segmentation fault (SIGSEGV).

If instead doing step 6 (stopping without doing step 5):

Fri May 29 13:53:06 2020: Stopping Fri May 29 13:53:06 2020: (vrrp1) sent 0 priority Fri May 29 13:53:06 2020: (vrrp1) removing VIPs. Fri May 29 13:53:07 2020: Keepalived_vrrp exited due to segmentation fault (SIGSEGV).


**Did keepalived coredump?**
Yes.

When doing aforementioned step 5:

Core was generated by `keepalived -D -n -l'. Program terminated with signal SIGSEGV, Segmentation fault.

0 if_get_by_ifindex (ifindex=23) at vrrp_if.c:96

96 if (ifp->ifindex == ifindex)

(gdb) bt

0 if_get_by_ifindex (ifindex=23) at vrrp_if.c:96

1 0x0000565006ee7d79 in netlink_link_filter (h=0x565007e7e750, snl=) at keepalived_netlink.c:1972

2 0x0000565006ee88bb in netlink_broadcast_filter (snl=, h=) at keepalived_netlink.c:2304

3 netlink_broadcast_filter (snl=, h=) at keepalived_netlink.c:2290

4 0x0000565006ee5215 in netlink_parse_info (filter=filter@entry=0x565006ee8520 , nl=nl@entry=0x56500716e430 , n=n@entry=0x0,

read_all=read_all@entry=true) at keepalived_netlink.c:1364

5 0x0000565006ee5725 in kernel_netlink (thread=) at keepalived_netlink.c:2334

6 0x0000565006f3b45e in thread_call (thread=0x565007e7bcf0) at scheduler.c:1923

7 process_threads (m=0x565007e7b870) at scheduler.c:1923

8 0x0000565006f3bbc5 in launch_thread_scheduler (m=) at scheduler.c:2030

9 0x0000565006f05261 in start_vrrp_child () at vrrp_daemon.c:1130

10 start_vrrp_child () at vrrp_daemon.c:1000

11 0x0000565006edbbe2 in start_keepalived (thread=) at main.c:530

12 0x0000565006f3b45e in thread_call (thread=0x565007e7b7f0) at scheduler.c:1923

13 process_threads (m=0x565007e7b230) at scheduler.c:1923

14 0x0000565006f3bbc5 in launch_thread_scheduler (m=) at scheduler.c:2030

15 0x0000565006eddf9f in keepalived_main (argc=4, argv=) at main.c:2392

16 0x00007faf614ad873 in __libc_start_main () from /lib64/libc.so.6

17 0x0000565006edba9e in _start ()

(gdb) print ifp $1 = (interface_t *) 0xfffffffffffffef0


If instead doing aforementioned step 6:

Core was generated by `keepalived -D -n -l'. Program terminated with signal SIGSEGV, Segmentation fault.

0 free_interface_queue () at vrrp_if.c:791

791 list_for_each_entry_safe(ifp, ifp_tmp, &if_queue, e_list)

(gdb) bt

0 free_interface_queue () at vrrp_if.c:791

1 0x000055d57594e3f2 in vrrp_terminate_phase2 (exit_status=exit_status@entry=0) at vrrp_daemon.c:262

2 0x000055d57594f268 in start_vrrp_child () at vrrp_daemon.c:1137

3 start_vrrp_child () at vrrp_daemon.c:1000

4 0x000055d575925be2 in start_keepalived (thread=) at main.c:530

5 0x000055d57598545e in thread_call (thread=0x55d575bdc7f0) at scheduler.c:1923

6 process_threads (m=0x55d575bdc230) at scheduler.c:1923

7 0x000055d575985bc5 in launch_thread_scheduler (m=) at scheduler.c:2030

8 0x000055d575927f9f in keepalived_main (argc=4, argv=) at main.c:2392

9 0x00007fe621c2f873 in __libc_start_main () from /lib64/libc.so.6

10 0x000055d575925a9e in _start ()

(gdb) print ifp $1 = (interface_t *) 0xfffffffffffffef0



**Additional context**
Explained above.
pqarmitage commented 4 years ago

Commit 3c37ce1 resolves the issue. @araujorm Many thanks for your detailed analysis.

araujorm commented 4 years ago

Seems to be working great now, great work guys.