[v2.0.20] segfault in ipvs_group_cmd

wdauchy commented 4 years ago

Describe the bug

We are hitting a regular segfault on v2.0.20

Program terminated with signal 7, Bus error.
#0 ipvs_group_cmd (rs=0x0, vs=0x55788c6d2340, drule=0x7ffcc4b83670, srule=0x7ffcc4b836b0, cmd=1154) at ipvswrapper.c:356
356 LIST_FOREACH(vsg->addr_range, vsg_entry, e) {
(gdb) thread apply all bt
Thread 1 (Thread 0x7f540dff4840 (LWP 13102)):
#0 ipvs_group_cmd (rs=0x0, vs=0x55788c6d2340, drule=0x7ffcc4b83670, srule=0x7ffcc4b836b0, cmd=1154) at ipvswrapper.c:356
#1 ipvs_cmd (cmd=cmd@entry=1154, vs=vs@entry=0x55788c6d2340, rs=rs@entry=0x0) at ipvswrapper.c:492
#2 0x000055788c30eb7c in init_service_vs (vs=0x55788c6d2340) at ipwrapper.c:564
#3 init_services () at ipwrapper.c:606
#4 0x000055788c3000b5 in start_check (old_checkers_queue=0x0, prev_global_data=<optimized out>) at check_daemon.c:364
#5 0x000055788c3004a7 in start_check_child () at check_daemon.c:657
#6 0x000055788c300630 in check_respawn_thread (thread=<optimized out>) at check_daemon.c:502
#7 0x000055788c319fbb in thread_call (thread=0x55788c654d70) at scheduler.c:1776
#8 process_threads (m=0x55788c654ea0) at scheduler.c:1834
#9 0x000055788c31a541 in launch_thread_scheduler (m=<optimized out>) at scheduler.c:1942
#10 0x000055788c2fc7ab in keepalived_main (argc=<optimized out>, argv=<optimized out>) at main.c:2220
#11 0x00007f540c9c4505 in __libc_start_main () from /lib64/libc.so.6
#12 0x000055788c2facae in _start ()

To Reproduce

For now no clear reproducer, but the config below is hetting the segfault very regularly

Keepalived version

Keepalived v2.0.20 (01/22,2020)

Copyright(C) 2001-2020 Alexandre Cassen, <acassen@gmail.com>

Built with kernel headers for Linux 5.4.17
Running on Linux 5.4.30-1.el7.x86_64 #1 SMP Fri Apr 3 12:47:34 UTC 2020

configure options: --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --disable-vrrp --enable-sha1 --enable-regex --with-init=systemd build_alias=x86_64-redhat-linux-gnu host_alias=x86_64-redhat-linux-gnu PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig CFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic LDFLAGS=-Wl,-z,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld

Config options:  LVS REGEX OLD_CHKSUM_COMPAT

System options:  PIPE2 SIGNALFD INOTIFY_INIT1 VSYSLOG EPOLL_CREATE1 IPV6_ADVANCED_API LIBNL3 RTA_ENCAP RTA_EXPIRES RTA_NEWDST RTA_PREF FRA_SUPPRESS_PREFIXLEN FRA_SUPPRESS_IFGROUP FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK RTEXT_FILTER_SKIP_STATS FRA_L3MDEV FRA_UID_RANGE RTAX_FASTOPEN_NO_COOKIE RTA_VIA FRA_OIFNAME FRA_PROTOCOL FRA_IP_PROTO FRA_SPORT_RANGE FRA_DPORT_RANGE RTA_TTL_PROPAGATE IFA_FLAGS IP_MULTICAST_ALL LWTUNNEL_ENCAP_MPLS LWTUNNEL_ENCAP_ILA NET_LINUX_IF_H_COLLISION LIBIPVS_NETLINK IPVS_DEST_ATTR_ADDR_FAMILY IPVS_SYNCD_ATTRIBUTES IPVS_64BIT_STATS IPVS_TUN_TYPE IPVS_TUN_CSUM IPVS_TUN_GRE SOCK_NONBLOCK SOCK_CLOEXEC O_PATH GLOB_BRACE SO_MARK SCHED_RESET_ON_FORK

Distro

centos 7
Architecture x86_64

Details of any containerisation or hosted service

bare metal machine

Configuration file: A full copy of the configuration file, obfuscated if necessary to protect passwords and IP addresses

global_defs {
    enable_script_security
}

$HEALTHCHECK_DEFINITION=\
    HTTP_GET { \
        connect_port 8500 \
        url { \
            path /v1/agent/health/service/name/${SERVICE_NAME}?format=text \
            status_code 200 \
            regex passing \
        } \
    }

$SERVICE_NAME=foo0-http
virtual_server_group foo0-4 {
    x.x.x.0 80
    x.x.x.x 443
}
virtual_server group foo0-4 {
    $VS_COMMON_OPTIONS
    protocol tcp
    lvs_method TUN type ipip
    lvs_sched mh
    connect_timeout 10
    retry 3
    warmup 5
    real_server x.x.x.x {
        weight 1
        $HEALTHCHECK_DEFINITION
    }
    real_server x.x.x.x {
        weight 1
        $HEALTHCHECK_DEFINITION
    }
    real_server x.x.x.x {
        weight 1
        $HEALTHCHECK_DEFINITION
    }
}
virtual_server_group foo0-6 {
    x:x:x:x::xx 80
    x:x:x:x::11 443
}
virtual_server group foo0-6 {
    $VS_COMMON_OPTIONS
    protocol tcp
    ip_family inet6
    lvs_method TUN type ipip
    lvs_sched mh
    connect_timeout 10
    retry 3
    warmup 5
    real_server x.x.x.x {
        weight 1
        $HEALTHCHECK_DEFINITION
    }
    real_server x.x.x.x {
        weight 1
        $HEALTHCHECK_DEFINITION
    }
    real_server x.x.x.x {
        weight 1
        $HEALTHCHECK_DEFINITION
    }
}

[...]
(many similar config with > 200 virtual server definitions on both ip4 and ip6

pqarmitage commented 4 years ago

@wdauchy I think this is going to be a difficult one to track down.

SIGBUS is a strange signal to receive, especially on x86 architecture. Does the fault consistently occur at ipvswrapper.c: line 356?

I think the only way I have any hope of tracking this down is to have a copy of your full configuration. In terms of obfuscating it, could you please just change the first octet of each IPv4 address to 10, and change the first word of IPv6 addresses to fd00; this will mean that I won't have to go through the whole configuration replacing xs with numbers.

It looks to me as though you have spun your own kernel. Are there any kernel configuration settings that you have changed?

wdauchy commented 4 years ago

Thank you for your answer.

@wdauchy I think this is going to be a difficult one to track down. SIGBUS is a strange signal to receive, especially on x86 architecture. Does the fault consistently occur at ipvswrapper.c: line 356?

yes I can also provide a core privately

I think the only way I have any hope of tracking this down is to have a copy of your full configuration. In terms of obfuscating it, could you please just change the first octet of each IPv4 address to 10, and change the first word of IPv6 addresses to fd00; this will mean that I won't have to go through the whole configuration replacing xs with numbers.

will send you that privately as github does not allow me to send it

It looks to me as though you have spun your own kernel. Are there any kernel configuration settings that you have changed?

yes we have our own build. also sent privately.

pqarmitage commented 4 years ago

I think a copy of the core file would be helpful, along with the matching non-stripped keepalived executable (if you installed keepalived from rpms then the keepalived and keepalived-debug rpms would be fine).

pqarmitage commented 4 years ago

@wdauchy The core dump that you have sent me is reporting that keepalived terminated due to signal 11, in function weigh_live_realservers, which doesn't match what you listed above:

[New LWP 8497]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/keepalived -D -m'.
Program terminated with signal 11, Segmentation fault.
#0  0x000056349a4449c0 in weigh_live_realservers (vs=0x56349b8af410) at ipwrapper.c:90
90      LIST_FOREACH(vs->rs, svr, e) {
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_7.2.x86_64 libcom_err-1.42.9-16.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 pcre-8.32-17.el7.x86_64 pcre2-10.23-2.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0  0x000056349a4449c0 in weigh_live_realservers (vs=0x56349b8af410) at ipwrapper.c:90
#1  set_quorum_states () at ipwrapper.c:442
#2  0x000056349a43916b in validate_check_config () at check_data.c:1129
#3  0x000056349a435f13 in start_check (old_checkers_queue=0x0, prev_global_data=0x0) at check_daemon.c:306
#4  0x000056349a4364a7 in start_check_child () at check_daemon.c:657
#5  0x000056349a436630 in check_respawn_thread (thread=<optimized out>) at check_daemon.c:502
#6  0x000056349a44ffbb in thread_call (thread=0x56349b88de50) at scheduler.c:1776
#7  process_threads (m=0x56349b88ed30) at scheduler.c:1834
#8  0x000056349a450541 in launch_thread_scheduler (m=<optimized out>) at scheduler.c:1942
#9  0x000056349a4327ab in keepalived_main (argc=<optimized out>, argv=<optimized out>) at main.c:2220
#10 0x00007f6d92dad505 in __libc_start_main (main=0x56349a430c80 <main>, argc=3, argv=0x7ffdaaebc638, init=<optimized out>, 
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffdaaebc628) at ../csu/libc-start.c:266
#11 0x000056349a430cae in _start ()

I can certainly look at this, but it isn't the issue you reported.

wdauchy commented 4 years ago

wow I probably mixed them, let me double check that

wdauchy commented 4 years ago

there is probably another issue I did not saw with weigh_live_realservers; I can open a new bug about it. in the meantime, I've sent you the core about ipvs_group_cmd

sorry for the confusion

wdauchy commented 4 years ago

@pqarmitage in fact we just realised removing "ip_family inet6" somehow workarounds the issue (but disable ipv6 in the context of mixed virtual server (ipv6 VIP + ipv4 RS)

pqarmitage commented 4 years ago

The new core file looks better, many thanks.

For both the core files you have provided, it looks like a register has got a rather strange value. For the original problem, where the code is: 0x0000564d80bdb75e <+286>: je 0x564d80bdb798 <ipvs_cmd+344> 0x0000564d80bdb760 <+288>: cmp $0x482,%ebx => 0x0000564d80bdb766 <+294>: mov 0x10(%rbp),%r12 0x0000564d80bdb76a <+298>: jne 0x564d80bdb6f0 <ipvs_cmd+176> 0x0000564d80bdb76c <+300>: lea 0x224dc5(%rip),%rax # 0x564d80e00538

the registers are: rax 0xfffffffb 4294967291 rbx 0x482 1154 rcx 0x7f9da5a349f0 140315065534960 rdx 0x0 0 rsi 0x564d81bc9404 94890889090052 rdi 0x7ffe3d648e50 140729928420944 rbp 0x800000564d81bc8b 0x800000564d81bc8b rsp 0x7ffe3d648db0 0x7ffe3d648db0 r8 0x0 0 r9 0x7ffe3d648e1f 140729928420895 r10 0x3 3 r11 0x7f9da5b1da40 140315066489408 r12 0x564d81bc9340 94890889089856 r13 0x564d81bc9340 94890889089856 r14 0x0 0 r15 0x7ffe3d648e10 140729928420880 rip 0x564d80bdb766 0x564d80bdb766 <ipvs_cmd+294> eflags 0x10246 [ PF ZF IF RF ]

so data is around about 0x564d81bc0000. However, rbp is 0x800000564d81bc8b which rather looks as though the register has been shifted right 8 bits, and the MSB set; an alternative is that the register was restored from the stack with an offset of 1 byte from where it should have been.

In the problem in the original core dump (i.e. the one that didn't match the original issue report), the code is: 0x000056349a4449b8 <+56>: je 0x56349a4449df <set_quorum_states+95> 0x000056349a4449ba <+58>: nopw 0x0(%rax,%rax,1) => 0x000056349a4449c0 <+64>: mov 0x10(%rax),%rdx 0x000056349a4449c4 <+68>: cmpb $0x0,0xe4(%rdx) 0x000056349a4449cb <+75>: je 0x56349a4449d7 <set_quorum_states+87>

with data around 0x56349b000000 but rax is 0x56009b8b0180, so the 0x34 byte has been set to 0x00 (this is the least significant 8 bits of the most significant 32 bits).

Whilst I can remember Z80 assembler pretty well, I don't think I have ever gone beyond 80286 assembler on Intel, so it will take a bit of time to get the hang of x86_64 assembler, but I will try and follow this through to see if this really comes down to register corruption. I will also test your configuration on my Fedora 31 system, which has gcc 9.3.1 as opposed to gcc 9.2.1 which you appear to be using.

wdauchy commented 4 years ago

@pqarmitage have you check the mentioned scenario above? i.e ipv6 VIP + ipv4 RS it seems to be the root cause of our issues

pqarmitage commented 4 years ago

@wdauchy You stated removing "ip_family inet6" works around the issue; did you simply comment out the ip_family inet6 line, or completely remove the IPv6 virtual servers with IPv4 real servers?

wdauchy commented 4 years ago

we simply commented ip_family inet6 line.

pqarmitage commented 4 years ago

That shouldn't make any difference. ip_family inet6 is only needed where the virtual server or virtual server group is specified by a fwmark and not all the virtual servers are IPv6. I will nevertheless have a look at this.

pqarmitage commented 4 years ago

There are a number of configuration errors being logged. I think it would be helpful to resolve all of those first to see if that resolves the problem. Apart from there being no definition of $VS_COMMON_OPTIONS, which would most simply be resolved by adding a line at the top of the config file: $VS_COMMON_OPTIONS=

The remaining errors I am seeing are:

Wed Apr  8 17:30:19 2020: (Line 390) Address family of virtual server and real server 10.236.136.20 don't match - skipping real server.
Wed Apr  8 17:30:19 2020: (Line 394) Address family of virtual server and real server 10.236.18.26 don't match - skipping real server.
Wed Apr  8 17:30:19 2020: (Line 409) Address family of virtual server and real server 10.236.136.20 don't match - skipping real server.
Wed Apr  8 17:30:19 2020: (Line 413) Address family of virtual server and real server 10.236.18.26 don't match - skipping real server.
Wed Apr  8 17:30:20 2020: (Line 723) Address family of virtual server and real server 10.236.100.20 don't match - skipping real server.
Wed Apr  8 17:30:20 2020: (Line 727) Address family of virtual server and real server 10.236.100.20 don't match - skipping real server.
Wed Apr  8 17:30:20 2020: (Line 3011) Address family of virtual server and real server 10.236.104.14 don't match - skipping real server.
Wed Apr  8 17:30:20 2020: (Line 3433) Address family of virtual server and real server 10.236.102.30 don't match - skipping real server.
Wed Apr  8 17:30:20 2020: (Line 3437) Address family of virtual server and real server 10.236.121.25 don't match - skipping real server.
Wed Apr  8 17:30:20 2020: (Line 3441) Address family of virtual server and real server 10.236.68.11 don't match - skipping real server.
Wed Apr  8 17:30:20 2020: (Line 3859) Address family of virtual server and real server 10.236.102.30 don't match - skipping real server.
Wed Apr  8 17:30:20 2020: (Line 3863) Address family of virtual server and real server 10.236.139.34 don't match - skipping real server.
Wed Apr  8 17:30:20 2020: (Line 3899) Address family of virtual server and real server 10.236.18.15 don't match - skipping real server.
Wed Apr  8 17:30:20 2020: (Line 4259) Address family of virtual server and real server 10.236.102.10 don't match - skipping real server.
Wed Apr  8 17:30:20 2020: Virtual server []:0 has no real servers - ignoring
Wed Apr  8 17:30:20 2020: Virtual server []:1 has no real servers - ignoring
Wed Apr  8 17:30:20 2020: VS []:0: real server [10.236.100.20]:tcp:0 is duplicated - removing second rs
Wed Apr  8 17:30:20 2020: Virtual server []:0 has no real servers - ignoring
Wed Apr  8 17:30:20 2020: Virtual server []:0 has no real servers - ignoring
Wed Apr  8 17:30:20 2020: Virtual server []:0 has no real servers - ignoring
Wed Apr  8 17:30:20 2020: Virtual server []:0 has no real servers - ignoring
Wed Apr  8 17:30:20 2020: Virtual server []:0 has no real servers - ignoring
Wed Apr  8 17:30:20 2020: Virtual server []:0 has no real servers - ignoring
Wed Apr  8 17:30:20 2020: Virtual server []:0 has no real servers - ignoring
Wed Apr  8 17:30:20 2020: Virtual server []:0 has no real servers - ignoring
Wed Apr  8 17:30:20 2020: Virtual server []:0 has no real servers - ignoring
Wed Apr  8 17:30:20 2020: Virtual server []:0 has no real servers - ignoring
Wed Apr  8 17:30:20 2020: Virtual server []:0 has no real servers - ignoring
Wed Apr  8 17:30:20 2020: Virtual server []:0 has no real servers - ignoring

wdauchy commented 4 years ago

Thanks for the precision, I noticed those errors and will have a deeper look at them in the following days. I however confirm the issue is gone after removing ip_family inet6 line (and defining VS_COMMON_OPTIONS) - but we can give another try after fixing them.

pqarmitage commented 4 years ago

@wdauchy I can now reproduce the problem. As you say, removing the ip_family inet6 stops the problem occurring; defining VS_COMMON_OPTIONS does not make any difference.

I can now work on this and should be able to identify the problem. It looks as though keepalived's internal data structures are becoming corrupted.

wdauchy commented 4 years ago

Good news! (for the reproducer)

pqarmitage commented 4 years ago

To summarise, so far there are three problems I have identified:

keepalived crashing when ip_family inet6 specified
If ip_family inet6 is not specified, keepalived reports the address family does not match the virtual server
Can a virtual_server_group have a mixture of IPv4 and IPv6 addresses if all the real servers are tunnelled? And what happens if the virtual_server_group also has fwmark entries?

I will update this entry as we progress.

pierrecdn commented 4 years ago

@pqarmitage (I'm working with @wdauchy)

Can a virtual_server_group have a mixture of IPv4 and IPv6 addresses if all the real servers are tunnelled?

This was our first intent. We switched to v4 + v6 dedicated vs_group when noticing that all v6 vips were discarded due to mixed families.

With this setup (v4 and v6 separated) the kernel is programmed (when checked using a decent ipvsadm version, entries are correct) but from what we understand the healthchecking part is going crazy.

We consequently thought about implementing this config first then discuss this feature with the community. Good to know that it's already identified with this issue ;)

pqarmitage commented 4 years ago

@pierrecdn @wdauchy

Update

The problem is caused by the virtual servers where the virtual_server_group has IPv6 addresses, the real servers of the virtual server have IPv4 addresses, and the lvs_method is NAT. Changing the lvs_method to be TUN for these instances (i.e. at lines 382, 401, 715, 3003, 3425, 3851, 3891 and 4251 of your configuration) stops the problem occurring. Alternatively, removing the ip_family inet6 from those real servers and leaving the lvs_method as NAT also stops the problem occurring.

I have also resolved point 2 above, so that it will not be necessary to specify ip_family inet6 at all.

Now to identify what is causing the problem explained above.

pqarmitage commented 4 years ago

Commit a4668f6 resolves the original issue reported.

Commit eb4a4b3 sets the address family from the virtual_server_group if all the real servers of a virtual server are tunnelled and the address family is not specified.

Commit eb4a4b3 means that you will no longer need to specify ip_family inet6 at all, but it would be good if you could test the updated code with your original configuration to ensure that it really is fixed for you.

pierrecdn commented 4 years ago

I have also resolved point 2 above, so that it will not be necessary to specify ip_family inet6 at all.

That's pretty cool and fast!

Trying to give you more context to explain why one would use one IP family for RS instead of having everything dual-stacked and using only ip4-ip4 or ip6-ip6 (outer-inner scheme).

(As you probably guessed) our goal is to be able to transparently perform migrations of the internal infrastructure (RSes).

The first step for that is to support mixed VIP families for a VS (here mostly IPv4 and IPv6 VIPs forwarded to v4 RSes). Sounds achievable, hence our workaround to split VS and VS group per IP families, and the discovery of the current issue. Would be the same for v6 RSes.

Now let's imagine we have mixed families for a given RS pool, within a VS. The worst case being something like:

virtual_server_group my_vip {
    198.51.100.1 80
    198.51.100.1 443
    2001:db8::1 80
    2001:db8::1 443
}
virtual_server group my_vip {
    # common stuff
    lvs_method TUN type ipip
    real_server 2001:db8::ff:1 {
        weight 1
        $HEALTHCHECK_DEFINITION
    }
    real_server 192.0.2.1 {
        weight 1
        $HEALTHCHECK_DEFINITION
    }
}

e.g. by using correct modules on the RS (ipip + sit + ip6_tunnel for ip-ip based protocols, or ip_gre + ip6_gre for gre families), we can encapsulate using the appropriate address family, and let RS migrate to dual-stack or mono-stack ipv6 autonomously (thanks to an automation we have in place).

This is perfectly supported by ipvs:

$ sudo ipvsadm -Ln -t 198.51.100.1:443
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  198.51.100.1:443 mh
  -> 192.0.2.1:443                Tunnel  1      0          0              
  -> [2001:db8::ff:1]:443         Tunnel  1      0          0

This would also mean for us the ability for keepalived to support it. I think it's worth creating a specific issue, what do you think?

pqarmitage commented 4 years ago

@pierrecdn I do agree that creating a separate issue for allowing mixed IPv4/6 in a virtual server group should be a separate issue. Once you have confirmed that commit a4668f6 resolves your original issue I think this issue should be closed.

I will wait for you to create the new issue and add any thoughts about mixed IPv4/6 virtual server groups to that issue.

wdauchy commented 4 years ago

@pqarmitage I confirm the mentioned commits are fixing the segfault, thanks a lot for the quick fixes! I believe we can consider this issue as fixed and open a new one for the feature request.

pqarmitage commented 4 years ago

@wdauchy @pierrecdn Commit fa1eaf7 should allow what you want to achieve with having both IPv4 and IPv6 addresses in a virtual server group. This is only permitted where all real and sorry servers of all virtual servers using the virtual server group and tunnelled, and also no fwmarks are configured on the virtual server group.

I will have a look later whether I can relax the restriction of not allowing fwmarks (it would mean that the fwmark configuration in the virtual server group would need to specify the address family for wach fwmark).

pierrecdn commented 4 years ago

@pqarmitage so fast, I only had time to write the corresponding issue :smile_cat: :arrow_up: Testing this one as well.

acassen / keepalived

[v2.0.20] segfault in ipvs_group_cmd #1536