Segv on rbtree when process_child_termination()

Rtoax commented 2 months ago

Keepalived version

$ keepalived -v
Keepalived v2.2.8 (04/04,2023), git commit v2.2.7-154-g292b299e+

Copyright(C) 2001-2023 Alexandre Cassen, <acassen@gmail.com>

Built with kernel headers for Linux 5.15.131
Running on Linux 5.15.131-10.x86_64 #1 SMP Tue Jun 25 15:54:27 CST 2024
Distro: Custom OS

configure options: --build=x86_64-cestc-linux-gnu --host=x86_64-cestc-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --enable-snmp --enable-snmp-rfc --enable-nftables --disable-iptables --enable-sha1 --enable-json --with-init=systemd build_alias=x86_64-cestc-linux-gnu host_alias=x86_64-cestc-linux-gnu PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig CC=gcc CFLAGS=-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/cestc/cestc-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/cestc/cestc-annobin-cc1 -m64 -march=x86-64-v2 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection LDFLAGS=-Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/cestc/cestc-hardened-ld -specs=/usr/lib/rpm/cestc/cestc-annobin-cc1 

Config options:  NFTABLES LVS VRRP VRRP_AUTH VRRP_VMAC JSON OLD_CHKSUM_COMPAT SNMP_V3_FOR_V2 SNMP_VRRP SNMP_CHECKER SNMP_RFCV2 SNMP_RFCV3 INIT=systemd SYSTEMD_NOTIFY

System options:  VSYSLOG MEMFD_CREATE IPV6_MULTICAST_ALL IPV4_DEVCONF LIBNL3 RTA_ENCAP RTA_EXPIRES RTA_NEWDST RTA_PREF FRA_SUPPRESS_PREFIXLEN FRA_SUPPRESS_IFGROUP FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK RTEXT_FILTER_SKIP_STATS FRA_L3MDEV FRA_UID_RANGE RTAX_FASTOPEN_NO_COOKIE RTA_VIA FRA_PROTOCOL FRA_IP_PROTO FRA_SPORT_RANGE FRA_DPORT_RANGE RTA_TTL_PROPAGATE IFA_FLAGS LWTUNNEL_ENCAP_MPLS LWTUNNEL_ENCAP_ILA NET_LINUX_IF_H_COLLISION LIBIPTC_LINUX_NET_IF_H_COLLISION LIBIPVS_NETLINK IPVS_DEST_ATTR_ADDR_FAMILY IPVS_SYNCD_ATTRIBUTES IPVS_64BIT_STATS IPVS_TUN_TYPE IPVS_TUN_CSUM IPVS_TUN_GRE VRRP_IPVLAN IFLA_LINK_NETNSID GLOB_BRACE GLOB_ALTDIRFUNC INET6_ADDR_GEN_MODE VRF SO_MARK

segv gdb

(gdb) bt
#0  rb_set_parent_color (color=1, p=0x56316f901b00, rb=0x0) at ../lib/rbtree_augmented.h:165
#1  ____rb_erase_color (augment_rotate=<optimized out>, root=<optimized out>, parent=0x56316f901b00) at ../lib/rbtree.c:372
#2  rb_erase (node=0x56316f937a30, root=0x56316f8fde88) at ../lib/rbtree.c:458
#3  0x000056316e188d4c in process_child_termination (status=256, pid=<optimized out>) at ../lib/scheduler.c:2145
#4  thread_child_handler (v=<optimized out>, unused=<optimized out>) at ../lib/scheduler.c:2181
#5  0x000056316e1874f8 in signal_run_callback (thread=0x56316f924cb0) at ../lib/signals.c:264
#6  0x000056316e1882ad in thread_call (thread=0x56316f924cb0) at ../lib/scheduler.c:2019
#7  process_threads (m=0x56316f8fde10) at ../lib/scheduler.c:2086
#8  0x000056316e193f7a in start_vrrp_child.isra.0 () at vrrp/vrrp_daemon.c:1165
#9  0x000056316e1882ad in thread_call (thread=0x56316f8feab0) at ../lib/scheduler.c:2019
#10 process_threads (m=0x56316f90be30) at ../lib/scheduler.c:2086
#11 0x000056316e11ca17 in keepalived_main (argc=<optimized out>, argv=<optimized out>) at core/main.c:2777
#12 0x00007f6830f42eb0 in __libc_start_call_main () from /lib64/libc.so.6
#13 0x00007f6830f42f60 in __libc_start_main_impl () from /lib64/libc.so.6
#14 0x000056316e113c65 in _start ()

pqarmitage commented 2 months ago

Could you please provide a copy of your keepalived configuration file. I'm going to need all the help I get to sort this one out.

I think it might also help if you could post the output from the following from gdb:

frame 5
print *thread

The problem seems to have occurred when a child process of the VRRP process exited, with an exit status of 1. I presume that this is some vrrp track script or notify script.

I also note that the backtrace lists a function start_vrrp_child.isra.0. I have never seen this before - it is normally start_vrrp_child. Do you know what is causing that?

pqarmitage commented 2 months ago

@Rtoax Is this problem repeatable or has it just occurred the once?

Zhiqiang-Lin commented 2 months ago

This problem has been reappeared in versions 2.2.8 and 2.3.1, but never in version 2.2.4. Repetition method: Use this command to start the keepalived:

nohup /usr/sbin/keepalived -f /etc/keepalived/keepalived.conf --dont-fork --vrrp -D -S 0 &

The script then sends SIGHUP signals to the main Keepalived process every second to keep keepalived overloading the configuration.

#!/bin/bash

TARGET_PID=$1

while true; do

    kill -SIGHUP "$TARGET_PID"

    sleep 1
done

Waiting a day or two will repeat the null pointer access problem.

Ways to speed up reproduction: Adding the following print to the process_child_termination function, based on the replication method above, can greatly improve the replication efficiency. The problem of null pointer access can be repeated within a few minutes of running the program.

    if (!thread_node) 
        return;

    log_message(LOG_INFO, "%s(pid %d): rb_erase(t=%p pid=%d)\n", __func__, getpid(), thread, pid);
    rb_erase(&thread->rb_data, &master->child_pid);
    log_message(LOG_INFO, "%s(pid %d): rb_erase(t=%p pid=%d) done\n", __func__, getpid(), thread, pid);

    thread->u.c.status = status;

Configuration Files:

global_defs {
    enable_script_security
    script_user root
    max_auto_priority -1
    vrrp_garp_master_refresh 60
}

# These are separate checks to provide the following behavior:
# If the loadbalanced endpoint is responding then all is well regardless
# of what the local api status is. Both checks will return success and
# we'll have the maximum priority. This means as long as there is a node
# with a functional loadbalancer it will get the VIP.
# If all of the loadbalancers go down but the local api is still running,
# the _both check will still succeed and allow any node with a functional
# api to take the VIP. This isn't preferred because it means all api
# traffic will go through one node, but at least it keeps the api available.

vrrp_script chk_ovs_alive_1 {
    script "/usr/bin/timeout 6 ps -ef"
    interval 2
    weight 49
    rise 3
    fall 3
}

vrrp_script chk_ovs_alive_2 {
    script "/usr/bin/timeout 5 ll"
    interval 1
    weight 11
    rise 4
    fall 2
}

vrrp_script chk_ovs_alive_3 {
    script "/usr/bin/timeout 4.9 ping 127.0.0.1 -c 3 -i 1"
    interval 1
    weight 13
    rise 3
    fall 4
}

vrrp_script chk_ovs_alive_4 {
    script "/usr/bin/timeout 4.9 ping 127.0.0.1 -c 3 -i 1"
    interval 1
    weight 50
    rise 3
    fall 2
}

vrrp_script chk_ovs_alive_5 {
    script "/usr/bin/timeout 4.9 ping 127.0.0.1 -c 3 -i 1"
    interval 1
    weight 50
    rise 3
    fall 2
}

vrrp_script chk_ovs_alive_6 {
    script "/usr/bin/timeout 4.9 ping 127.0.0.1 -c 3 -i 1"
    interval 1
    weight 50
    rise 3
    fall 2
}

vrrp_script chk_ovs_alive_7 {
    script "/usr/bin/timeout 4.9 ping 127.0.0.1 -c 3 -i 1"
    interval 1
    weight 50
    rise 3
    fall 2
}

vrrp_script chk_ovs_alive_8 {
    script "/usr/bin/timeout 4.9 ping 127.0.0.1 -c 3 -i 1"
    interval 1
    weight 50
    rise 3
    fall 2
}

vrrp_script chk_ovs_alive_9 {
    script "/usr/bin/timeout 4.9 ping 127.0.0.1 -c 3 -i 1"
    interval 1
    weight 50
    rise 3
    fall 2
}

vrrp_script chk_ovs_alive_10 {
    script "/usr/bin/timeout 4.9 ping 127.0.0.1 -c 3 -i 1"
    interval 1
    weight 50
    rise 3
    fall 2
}

vrrp_instance cluster24 {
    state BACKUP
    interface enp1s0
    virtual_router_id 2
    priority 40
    advert_int 1

    unicast_src_ip *.*.*.*
    unicast_peer {
        *.*.*.*
    }

    authentication {
        auth_type PASS
        auth_pass cluster24
    }
    virtual_ipaddress {
        *.*.*.*/32
    }
    track_script {
        chk_ovs_alive_1
        chk_ovs_alive_2
        chk_ovs_alive_3
        chk_ovs_alive_4
        chk_ovs_alive_5
        chk_ovs_alive_6
        chk_ovs_alive_7
        chk_ovs_alive_8
        chk_ovs_alive_9
        chk_ovs_alive_10
    }
}

pqarmitage commented 1 month ago

@Zhiqiang-Lin Many thanks for the information above. I have found the cause of the problem, and now just need to work out a solution. The problem is caused by a script having terminated (or timed out), and the thread for processing the termination not having been run before the thread for processing the reload is run.

I have also identified a further problem while investigating this, which is that threads queued for running scripts are still queued after the reload, but they have pointers to the old script details which have been freed during the reload.

Rtoax commented 1 month ago

repeatable

Sorry for the late reply, it's repeatable problem i think.

pqarmitage commented 1 month ago

This was a very difficult problem to track down. It only occurred when a track_script had timed out, the thread relating to the timeout had not yet been processed, and keepalived was signalled to reload. This caused a red-black tree to be corrupted, and subsequent use of that red-black tree could cause a segfault.

Commit 7e04261d resolves this issue.

acassen / keepalived

Segv on rbtree when process_child_termination() #2468