acassen / keepalived

Keepalived
https://www.keepalived.org
GNU General Public License v2.0
4.02k stars 736 forks source link

VRRP segmentation fault when creating a bridge with Docker, attaching a container and starting it #1232

Closed WRMSRwasTaken closed 5 years ago

WRMSRwasTaken commented 5 years ago

Describe the bug After creating a bridge with docker network create, attaching the container with docker network connect and starting it, the VRRP child process segfaults, although it is not configured to do anything with that interface.

To Reproduce See above.

Expected behavior keepalived not segfaulting.

Keepalived version

Keepalived v2.0.13 (02/18,2019), git commit v2.0.12-53-ga9ed1993+

Copyright(C) 2001-2019 Alexandre Cassen, <acassen@gmail.com>

Built with kernel headers for Linux 4.17.11
Running on Linux 4.20.16.a-1-hardened #1 SMP PREEMPT Wed Mar 13 23:54:29 CET 2019

configure options: --prefix=/usr --sysconfdir=/etc --sbindir=/usr/bin --localstatedir=/var --runstatedir=/run CFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -g -fvar-tracking-assignments -fdebug-prefix-map=/home/aur/keepalived/src=/usr/src/debug LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now CPPFLAGS=-D_FORTIFY_SOURCE=2

Config options:  LIBIPTC LIBIPSET_DYNAMIC NFTABLES LVS VRRP VRRP_AUTH OLD_CHKSUM_COMPAT FIB_ROUTING EINTR_CHECK

System options:  PIPE2 SIGNALFD INOTIFY_INIT1 VSYSLOG EPOLL_CREATE1 IPV4_DEVCONF IPV6_ADVANCED_API LIBNL3 RTA_ENCAP RTA_EXPIRES RTA_NEWDST RTA_PREF FRA_SUPPRESS_PREFIXLEN FRA_SUPPRESS_IFGROUP FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK RTEXT_FILTER_SKIP_STATS FRA_L3MDEV FRA_UID_RANGE RTAX_FASTOPEN_NO_COOKIE RTA_VIA FRA_OIFNAME FRA_PROTOCOL FRA_IP_PROTO FRA_SPORT_RANGE FRA_DPORT_RANGE RTA_TTL_PROPAGATE IFA_FLAGS IP_MULTICAST_ALL LWTUNNEL_ENCAP_MPLS LWTUNNEL_ENCAP_ILA LIBIPTC NET_LINUX_IF_H_COLLISION LIBIPVS_NETLINK IPVS_DEST_ATTR_ADDR_FAMILY IPVS_SYNCD_ATTRIBUTES IPVS_64BIT_STATS VRRP_VMAC CN_PROC SOCK_NONBLOCK SOCK_CLOEXEC O_PATH GLOB_BRACE INET6_ADDR_GEN_MODE VRF SO_MARK SCHED_RT SCHED_RESET_ON_FORK

Distro (please complete the following information):

Details of any containerisation or hosted service (e.g. AWS) N/A

Configuration file: https://git.archlinux.org/svntogit/community.git/tree/trunk/PKGBUILD?h=packages/keepalived ... with enabled debug symbols in /etc/makepkg.conf to be able to supply the stack trace.

System Log entries

Keepalived_vrrp[28229]: Interface tony3 added
systemd-udevd[30405]: Using default interface naming scheme 'v240'.
systemd-udevd[30405]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Keepalived_vrrp[28229]: (tony3) MAC Address changed from 0a:64:ee:6e:1a:35 to 02:42:c7:96:38:d1
systemd-udevd[30405]: Could not generate persistent MAC address for tony3: No such file or directory
dockerd[18417]: time="2019-04-20T19:49:40.485177435+02:00" level=error msg="Could not add route to IPv6 network fd00:aa:1::1/64 via device tony3"
kernel: IPv6: ADDRCONF(NETDEV_UP): tony3: link is not ready
Keepalived_vrrp[28229]: Interface vetha34176b added
systemd-udevd[30743]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
systemd-udevd[30743]: Using default interface naming scheme 'v240'.
Keepalived_vrrp[28229]: Interface vethcf50df0 added
systemd-udevd[30743]: Could not generate persistent MAC address for vetha34176b: No such file or directory
audit: ANOM_PROMISCUOUS dev=vethcf50df0 prom=256 old_prom=0 auid=4294967295 uid=0 gid=0 ses=4294967295
kernel: tony3: port 1(vethcf50df0) entered blocking state
kernel: tony3: port 1(vethcf50df0) entered disabled state
kernel: device vethcf50df0 entered promiscuous mode
kernel: IPv6: ADDRCONF(NETDEV_UP): vethcf50df0: link is not ready
systemd-udevd[30745]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
systemd-udevd[30745]: Using default interface naming scheme 'v240'.
systemd-udevd[30745]: Could not generate persistent MAC address for vethcf50df0: No such file or directory
dockerd[18417]: time="2019-04-20T19:50:22.604421628+02:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/5bab7ae11e48eb185159aebacd48c86689337ac075f10712b2fff4bac67b353a/shim.sock" debug=false pid=30749
systemd[1]: run-docker-runtime\x2drunc-moby-5bab7ae11e48eb185159aebacd48c86689337ac075f10712b2fff4bac67b353a-runc.Fa28Gc.mount: Succeeded.
systemd[12152]: run-docker-runtime\x2drunc-moby-5bab7ae11e48eb185159aebacd48c86689337ac075f10712b2fff4bac67b353a-runc.Fa28Gc.mount: Succeeded.
Keepalived_vrrp[28229]: Interface vetha34176b deleted
kernel: eth0: renamed from vetha34176b
kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
systemd-networkd[18423]: vethcf50df0: Gained carrier
kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethcf50df0: link becomes ready
kernel: tony3: port 1(vethcf50df0) entered blocking state
kernel: tony3: port 1(vethcf50df0) entered forwarding state
kernel: IPv6: ADDRCONF(NETDEV_CHANGE): tony3: link becomes ready
systemd-networkd[18423]: tony3: Gained carrier
systemd-networkd[18423]: tony3: Gained IPv6LL
kernel: keepalived[28229]: segfault at 8 ip 000004154ded88c7 sp 000073d165fd3730 error 4 in keepalived[4154de8d000+53000]
kernel: Code: 58 20 48 83 c4 08 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 f5 53 48 89 fb bf 18 00 00 00 48 83 ec 08 67 e8 f9 9f ff ff <48> 8b 53 08 48 83 3b 00 48 89 68 10 48 89 50 08 48 c7 00 00 00 00
systemd[1]: Started Process Core Dump (PID 31081/UID 0).
systemd-networkd[18423]: vethcf50df0: Gained IPv6LL
Keepalived[15355]: Keepalived_vrrp exited due to segmentation fault (SIGSEGV).
Keepalived[15355]:   Please report a bug at https://github.com/acassen/keepalived/issues
Keepalived[15355]:   and include this log from when keepalived started, a description
Keepalived[15355]:   of what happened before the crash, your configuration file and the details below.
Keepalived[15355]:   Also provide the output of keepalived -v, what Linux distro and version
Keepalived[15355]:   you are running on, and whether keepalived is being run in a container or VM.
Keepalived[15355]:   A failure to provide all this information may mean the crash cannot be investigated.
Keepalived[15355]:   If you are able to provide a stack backtrace with gdb that would really help.
Keepalived[15355]:   Source version 2.0.13 , git commit v2.0.12-53-ga9ed1993+
Keepalived[15355]:   Built with kernel headers for Linux 4.17.11
Keepalived[15355]:   Running on Linux 4.20.16.a-1-hardened #1 SMP PREEMPT Wed Mar 13 23:54:29 CET 2019
Keepalived[15355]:   Command line: '/usr/bin/keepalived' '-D'
Keepalived[15355]:   configure options: --prefix=/usr --sysconfdir=/etc --sbindir=/usr/bin --localstatedir=/var
Keepalived[15355]:                      --runstatedir=/run CFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -g
Keepalived[15355]:                      -fvar-tracking-assignments
Keepalived[15355]:                      -fdebug-prefix-map=/home/aur/keepalived/src=/usr/src/debug
Keepalived[15355]:                      LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now
Keepalived[15355]:                      CPPFLAGS=-D_FORTIFY_SOURCE=2
Keepalived[15355]:   Config options: LIBIPTC LIBIPSET_DYNAMIC NFTABLES LVS VRRP VRRP_AUTH OLD_CHKSUM_COMPAT FIB_ROUTING
Keepalived[15355]:                   EINTR_CHECK
Keepalived[15355]:   System options: PIPE2 SIGNALFD INOTIFY_INIT1 VSYSLOG EPOLL_CREATE1 IPV4_DEVCONF IPV6_ADVANCED_API
Keepalived[15355]:                   LIBNL3 RTA_ENCAP RTA_EXPIRES RTA_NEWDST RTA_PREF FRA_SUPPRESS_PREFIXLEN
Keepalived[15355]:                   FRA_SUPPRESS_IFGROUP FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK RTEXT_FILTER_SKIP_STATS
Keepalived[15355]:                   FRA_L3MDEV FRA_UID_RANGE RTAX_FASTOPEN_NO_COOKIE RTA_VIA FRA_OIFNAME FRA_PROTOCOL
Keepalived[15355]:                   FRA_IP_PROTO FRA_SPORT_RANGE FRA_DPORT_RANGE RTA_TTL_PROPAGATE IFA_FLAGS
Keepalived[15355]:                   IP_MULTICAST_ALL LWTUNNEL_ENCAP_MPLS LWTUNNEL_ENCAP_ILA LIBIPTC
Keepalived[15355]:                   NET_LINUX_IF_H_COLLISION LIBIPVS_NETLINK IPVS_DEST_ATTR_ADDR_FAMILY
Keepalived[15355]:                   IPVS_SYNCD_ATTRIBUTES IPVS_64BIT_STATS VRRP_VMAC CN_PROC SOCK_NONBLOCK SOCK_CLOEXEC
Keepalived[15355]:                   O_PATH GLOB_BRACE INET6_ADDR_GEN_MODE VRF SO_MARK SCHED_RT SCHED_RESET_ON_FORK
Keepalived[15355]: VRRP child process(28229) died: Respawning
Keepalived[15355]: Starting VRRP child process, pid=31086
Keepalived_vrrp[31086]: Registering Kernel netlink reflector
Keepalived_vrrp[31086]: Registering Kernel netlink command channel
Keepalived_vrrp[31086]: Opening file '/etc/keepalived/keepalived.conf'.
Keepalived_vrrp[31086]: SECURITY VIOLATION - scripts are being executed but script_security not enabled.
Keepalived_vrrp[31086]: (VI_1) VRRP version 3 does not support authentication. Ignoring.
Keepalived_vrrp[31086]: Assigned address 10.5.0.1 for interface ens6
Keepalived_vrrp[31086]: Assigned address fe80::e4fd:acff:fe18:dbdc for interface ens6
Keepalived_vrrp[31086]: Registering gratuitous ARP shared channel
Keepalived_vrrp[31086]: (VI_1) removing VIPs.
systemd-coredump[31082]: Process 28229 (keepalived) of user 0 dumped core.

                                                       Stack trace of thread 28229:
                                                       #0  0x000004154ded88c7 __list_add (keepalived)
                                                       #1  0x000004154de9688e netlink_if_address_filter (keepalived)
                                                       #2  0x000004154de97c98 netlink_if_address_filter (keepalived)
                                                       #3  0x000004154de94fd5 netlink_parse_info (keepalived)
                                                       #4  0x000004154de953b1 netlink_parse_info (keepalived)
                                                       #5  0x000004154ded7eb3 thread_call (keepalived)
                                                       #6  0x000004154deab209 start_vrrp_child (keepalived)
                                                       #7  0x000004154deab243 vrrp_respawn_thread (keepalived)
                                                       #8  0x000004154ded7eb3 thread_call (keepalived)
                                                       #9  0x000004154de8ef2d keepalived_main (keepalived)
                                                       #10 0x00006e7aff263223 __libc_start_main (libc.so.6)
                                                       #11 0x000004154de8d09e _start (keepalived)
Keepalived_vrrp[31086]: (VI_1) Entering BACKUP STATE (init)
audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-coredump@4-31081-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
systemd[1]: systemd-coredump@4-31081-0.service: Succeeded.
Keepalived_vrrp[31086]: VRRP sockpool: [ifindex(3), family(IPv4), proto(112), unicast(0), fd(11,12)]

[Not exactly when keepalived started, but from before creating the Docker bridge]

Did keepalived coredump?

#0  __list_add (e=<optimized out>, l=<optimized out>) at list.c:115
No locals.
#1  list_add_r (l=0x0, data=0x41560f12b90) at list.c:117
        e = 0x41560f0e9f0
#2  0x000004154de9688e in netlink_if_address_filter (h=0x41560f0a6e0, snl=<optimized out>) at keepalived_netlink.c:1053
        ifa = 0x41560f0a6f0
        tb = {0x0, 0x41560f0a6f8, 0x41560f0a6f8, 0x0, 0x0, 0x0, 0x41560f0a70c, 0x0, 0x41560f0a720}
        ifp = 0x41560f0fb60
        ipaddr = <optimized out>
        len = <optimized out>
        addr = {addr = 0x41560f0a6fc, in = 0x41560f0a6fc, in6 = 0x41560f0a6fc}
        addr_p = <optimized out>
        addr6_p = <optimized out>
        addr_str = "\360\226\361`\025\004\000\000\000\276\275\062 \255\314x*\000\000\000\000\000\000\000\000T\361`\025\004\000\000\360\226\361`\025\004\000\000P\240\361`\025\004"
        addr_chg = <optimized out>
        e = <optimized out>
        vrrp = <optimized out>
        address_vrrp = <optimized out>
        tvp = <optimized out>
        is_tracking_saddr = <optimized out>
#3  0x000004154de97c98 in netlink_if_address_filter (h=<optimized out>, snl=<optimized out>) at keepalived_netlink.c:2322
        ipaddr = <optimized out>
        len = <optimized out>
        addr_chg = false
        e = <optimized out>
        is_tracking_saddr = <optimized out>
        addr = <optimized out>
        addr6_p = <optimized out>
        address_vrrp = <optimized out>
        tvp = <optimized out>
        tb = <optimized out>
        ifp = <optimized out>
        addr_p = <optimized out>
        vrrp = <optimized out>
        ifa = <optimized out>
        addr_str = <optimized out>
        ifa = <optimized out>
        tb = <optimized out>
        ifp = <optimized out>
        ipaddr = <optimized out>
        len = <optimized out>
        addr = <optimized out>
        addr_p = <optimized out>
        addr6_p = <optimized out>
        addr_str = <optimized out>
        addr_chg = <optimized out>
        e = <optimized out>
        vrrp = <optimized out>
        address_vrrp = <optimized out>
        tvp = <optimized out>
        is_tracking_saddr = <optimized out>
#4  netlink_broadcast_filter (snl=<optimized out>, h=<optimized out>) at keepalived_netlink.c:2322
No locals.
#5  0x000004154de94fd5 in netlink_parse_info (filter=filter@entry=0x4154de97b30 <netlink_broadcast_filter>, nl=nl@entry=0x4154df053f0 <nl_kernel>, n=n@entry=0x0, read_all=read_all@entry=true) at keepalived_netlink.c:1425
        iov = {iov_base = 0x41560f0a6e0, iov_len = 72}
        snl = {nl_family = 16, nl_pad = 0, nl_pid = 0, nl_groups = 256}
        msg = {msg_name = 0x73d165fd39b4, msg_namelen = 12, msg_iov = 0x73d165fd39c0, msg_iovlen = 1, msg_control = 0x0, msg_controllen = 0, msg_flags = 0}
        h = 0x41560f0a6e0
        len = 72
        ret = 0
        error = <optimized out>
        nlmsg_buf = 0x41560f0a6e0 "H"
        nlmsg_buf_size = 72
#6  0x000004154de953b1 in netlink_parse_info (read_all=true, n=0x0, nl=0x4154df053f0 <nl_kernel>, filter=0x4154de97b30 <netlink_broadcast_filter>) at keepalived_netlink.c:2347
        len = <optimized out>
        ret = 0
        nlmsg_buf = 0x0
        error = <optimized out>
        nlmsg_buf_size = 0
        len = <optimized out>
        ret = <optimized out>
        error = <optimized out>
        nlmsg_buf = <optimized out>
        nlmsg_buf_size = <optimized out>
        iov = <optimized out>
        snl = <optimized out>
        msg = <optimized out>
        h = <optimized out>
        err = <optimized out>
#7  kernel_netlink (thread=<optimized out>) at keepalived_netlink.c:2347
        nl = 0x4154df053f0 <nl_kernel>
#8  0x000004154ded7eb3 in thread_call (thread=0x41560f18720) at scheduler.c:1793
No locals.
#9  process_threads (m=0x41560f18820) at scheduler.c:1793
        thread = 0x41560f18720
        thread_list = <optimized out>
        thread_type = <optimized out>
#10 0x000004154ded84e2 in launch_thread_scheduler (m=<optimized out>) at scheduler.c:1890
No locals.
#11 0x000004154deab209 in start_vrrp_child () at vrrp_daemon.c:1033
        pid = <optimized out>
        syslog_ident = <optimized out>
        pid = <optimized out>
        syslog_ident = <optimized out>
#12 start_vrrp_child () at vrrp_daemon.c:903
        pid = <optimized out>
        syslog_ident = <optimized out>
#13 0x000004154deab243 in vrrp_respawn_thread (thread=<optimized out>) at vrrp_daemon.c:845
No locals.
#14 0x000004154ded7eb3 in thread_call (thread=0x41560f186a0) at scheduler.c:1793
No locals.
#15 process_threads (m=0x41560f185d0) at scheduler.c:1793
        thread = 0x41560f186a0
        thread_list = <optimized out>
        thread_type = <optimized out>
#16 0x000004154ded84e2 in launch_thread_scheduler (m=<optimized out>) at scheduler.c:1890
No locals.
#17 0x000004154de8ef2d in keepalived_main (argc=2, argv=<optimized out>) at main.c:1953
        report_stopped = true
        uname_buf = {sysname = "Linux", '\000' <repeats 59 times>, nodename = "zarkon.mcl.gg", '\000' <repeats 51 times>, release = "4.20.16.a-1-hardened", '\000' <repeats 44 times>, version = "#1 SMP PREEMPT Wed Mar 13 23:54:29 CET 2019", '\000' <repeats 21 times>, machine = "x86_64", '\000' <repeats 58 times>, domainname = "(none)", '\000' <repeats 58 times>}
        end = 0x73d165fd3cb7 ".mcl.gg"
#18 0x00006e7aff263223 in __libc_start_main () from /usr/lib/libc.so.6
No symbol table info available.
#19 0x000004154de8d09e in _start ()
No symbol table info available.

Additional context Add any other context about the problem here.

pqarmitage commented 5 years ago

Can you try building keepalived from the current master branch (or at least including commit 6f41772) and see if that resolves your problem (the commit was to resolve issue #1215 which sounds as though it may be a similar issue).

pqarmitage commented 5 years ago

I have added a further commit (128bfe6) which resolves some further issues when an interface has its parent interface in a different network namespace.

Further investigation has revealed that list_add was being called with a NULL list pointer from netlink_if_address_filter() by the following code:

 else {
     addr6_p = MALLOC(sizeof(*addr.in6));
     *addr6_p = *addr.in6;
     list_add(ifp->sin6_addr_l, addr6_p);
 }

ifp->sin6_addr_l was therefore NULL. It was discovered that this occurred when the interface was recreated, having previously existed, been deleted, and then created again.

Commit 09d90db resolves the issue.

pqarmitage commented 5 years ago

Commits 3207f5c, 83a6a6a and 2c58e35 are also worth incorporating.

WRMSRwasTaken commented 5 years ago

Sorry for the late answer, I can confirm that version v2.0.15-57-gab920c8e from commit https://github.com/acassen/keepalived/commit/ab920c8e82e0eba64887d4b3bf0178b8dd5e530e does not segfault anymore for me.