acassen / keepalived

Keepalived
https://www.keepalived.org
GNU General Public License v2.0
3.95k stars 736 forks source link

Crash at startup when some interfaces are created before vrrp_startup_delay timeout #1803

Closed louis-6wind closed 3 years ago

louis-6wind commented 3 years ago

Describe the bug Process crash at startup with vrrp_startup_delay when some interfaces are not yet created

To Reproduce no procedure yet

Expected behavior no crash

Keepalived version Keepalived v2.1.5 (12/05,2020), git commit v2.1.5-237-g7b8b40ef

Copyright(C) 2001-2020 Alexandre Cassen, acassen@gmail.com

Built with kernel headers for Linux 4.15.18 Running on Linux 5.3.0-46-generic #38~18.04.1 SMP Wed May 13 11:59:56 CEST 2020 Distro: Ubuntu 18.04.1 LTS

configure options: --prefix=/usr --sysconfdir=/etc --with-extra-cflags=-I/usr/include/libnl3 --with-extra-ldflags= --with-extra-libs=-lnl-genl-3 --disable-lvs --with-init=systemd --host=x86_64-linux-gnu host_alias=x86_64-linux-gnu

Config options: VRRP VRRP_AUTH OLD_CHKSUM_COMPAT FIB_ROUTING

System options: PIPE2 SIGNALFD INOTIFY_INIT1 VSYSLOG EPOLL_CREATE1 MEMFD_CREATE IPV4_DEVCONF IPV6_ADVANCED_API RTA_ENCAP RTA_EXPIRES RTA_NEWDST RTA_PREF FRA_SUPPRESS_PREFIXLEN FRA_SUPPRESS_IFGROUP FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK RTEXT_FILTER_SKIP_STATS FRA_L3MDEV FRA_UID_RANGE RTAX_FASTOPEN_NO_COOKIE RTA_VIA FRA_OIFNAME RTA_TTL_PROPAGATE IFA_FLAGS IP_MULTICAST_ALL LWTUNNEL_ENCAP_MPLS LWTUNNEL_ENCAP_ILA NET_LINUX_IF_H_COLLISION LIBIPTC_LINUX_NET_IF_H_COLLISION VRRP_VMAC VRRP_IPVLAN IFLA_LINK_NETNSID CN_PROC SOCK_NONBLOCK SOCK_CLOEXEC O_PATH GLOB_BRACE INET6_ADDR_GEN_MODE VRF SO_MARK SCHED_RESET_ON_FORK

Distro (please complete the following information):

Details of any containerisation or hosted service (e.g. AWS) no Configuration file: A full copy of the configuration file, obfuscated if necessary to protect passwords and IP addresses

Notify and track scripts no

System Log entries cat>/etc/keepalived/keepalived.conf <<\EOF global_defs { router_id gw-master enable_script_security script_user root dynamic_interfaces vrrp_startup_delay 10 }

vrrp_sync_group mygroup { group { vrrp.1 vrrp.wan vrrp.8 } }

vrrp_instance vrrp.1 { version 2 state BACKUP interface vlan1

use_vmac vrrp.1

track_file { }

garp_master_delay 5

virtual_router_id 1

priority 200 advert_int 1.0

virtual_ipaddress { 10.10.1.254/24 }

preempt_delay 60

} vrrp_instance vrrp.wan { version 2 state BACKUP interface wan

use_vmac vrrp.wan

track_file { }

garp_master_delay 5

virtual_router_id 100

priority 200 advert_int 1.0

virtual_ipaddress { 20.20.20.254/24 }

preempt_delay 60

} vrrp_instance vrrp-ha { version 2 state BACKUP interface ha

use_vmac vrrp-ha

track_file { }

garp_master_delay 5

virtual_router_id 1

priority 200 advert_int 1.0

virtual_ipaddress { 192.168.20.1/24 }

preempt_delay 60

} vrrp_instance vrrp.8 { version 2 state BACKUP interface vlan8

use_vmac vrrp.8

track_file { }

garp_master_delay 5

virtual_router_id 8

priority 200 advert_int 1.0

virtual_ipaddress { 10.10.8.254/24 }

preempt_delay 60

} EOF

Did keepalived coredump?

0 0x0000559c38465d14 in setup_interface (vrrp=0x559c39124ec0) at vrrp_if.c:1485

1485 if (vrrp->sockets->fd_in == -1) { (gdb) bt

0 0x0000559c38465d14 in setup_interface (vrrp=0x559c39124ec0) at vrrp_if.c:1485

1 0x0000559c38465f62 in recreate_vmac_thread (thread=0x559c39120060) at vrrp_if.c:1537

2 0x0000559c38484d8b in thread_call (thread=0x559c39120060) at scheduler.c:1909

3 0x0000559c38484ec3 in process_threads (m=0x559c39119550) at scheduler.c:1973

4 0x0000559c38485264 in launch_thread_scheduler (m=0x559c39119550) at scheduler.c:2080

5 0x0000559c38445449 in start_vrrp_child () at vrrp_daemon.c:1112

6 0x0000559c3843052d in start_keepalived (thread=0x559c391194d0) at main.c:552

7 0x0000559c38484d8b in thread_call (thread=0x559c391194d0) at scheduler.c:1909

8 0x0000559c38484ec3 in process_threads (m=0x559c39119550) at scheduler.c:1973

9 0x0000559c38485264 in launch_thread_scheduler (m=0x559c39119550) at scheduler.c:2080

10 0x0000559c384338b5 in keepalived_main (argc=2, argv=0x7ffd50a614b8) at main.c:2695

11 0x0000559c3842ff1a in main (argc=2, argv=0x7ffd50a614b8) at main.c:29

(gdb) print *vrrp $1 = {family = 2, iname = 0x559c39123f60 "vrrp.8", sync = 0x559c3911c5a0, stats = 0x559c39125350, ifp = 0x559c3911f900, dont_track_primary = false, linkbeat_use_polling = false, skip_check_adv_addr = false, strict_mode = 0, vmac_flags = 3, vmac_ifname = "vrrp.8\000\000\000\000\000\000\000\000\000", duplicate_vrid_fault = false, ipvlan_addr = 0x0, ipvlan_type = 0, configured_ifp = 0x559c391253c0, track_ifp = {next = 0x559c39124f28, prev = 0x559c39124f28}, track_script = {next = 0x559c39124f38, prev = 0x559c39124f38}, track_file = {next = 0x559c39124f48, prev = 0x559c39124f48}, track_process = { next = 0x559c39124f58, prev = 0x559c39124f58}, num_script_if_fault = 5, num_script_init = 0, notifies_sent = true, saddr = {ss_family = 0, ss_padding = '\000' <repeats 117 times>, ss_align = 0}, saddr_from_config = false, track_saddr = false, pkt_saddr = {ss_family = 0, ss_padding = '\000' <repeats 117 times>, ss_align = 0}, rx_ttl_hop_limit = 0, multicast_pkt = false, unicast_peer = {next = 0x559c39125088, prev = 0x559c39125088}, ttl = 255, check_unicast_src = false, unicast_chksum_compat = CHKSUM_COMPATIBILITY_NONE, master_saddr = {ss_family = 0, ss_padding = '\000' <repeats 117 times>, __ss_align = 0}, master_priority = 0 '\000', last_transition = {tv_sec = 1607342769, tv_usec = 570101}, garp_delay = 5000000, garp_refresh = {tv_sec = 0, tv_usec = 0}, garp_refresh_timer = {tv_sec = 0, tv_usec = 0}, garp_rep = 5, garp_refresh_rep = 1, garp_lower_prio_delay = 5000000, garp_pending = false, gna_pending = false, garp_lower_prio_rep = 5, lower_prio_no_advert = 0, higher_prio_send_advert = 0, vmac_garp_intvl = {tv_sec = 0, tv_usec = 0}, vmac_garp_all_if = false, vmac_garp_timer = {tv_sec = 0, tv_usec = 0}, vrid = 8 '\b', base_priority = 200 '\310', effective_priority = 200 '\310', total_priority = 200, vipset = false, vip = {next = 0x559c391255d8, prev = 0x559c391255d8}, vip_cnt = 1, evip = { next = 0x559c391251d8, prev = 0x559c391251d8}, promote_secondaries = false, evip_other_family = false, vroutes = {next = 0x559c391251f0, prev = 0x559c391251f0}, vrules = {next = 0x559c39125200, prev = 0x559c39125200}, adver_int = 1000000, master_adver_int = 1000000, kernel_rx_buf_size = 0, nopreempt = false, preempt_delay = 60000000, preempt_time = {tv_sec = 0, tv_usec = 0}, state = 3, wantstate = 1, reload_master = false, sockets = 0x0, debug = 0, version = 2, smtp_alert = 0, last_email_state = 0, notify_exec = false, notify_deleted = false, script_backup = 0x0, script_master = 0x0, script_fault = 0x0, script_stop = 0x0, script_deleted = 0x0, script_master_rx_lower_pri = 0x0, script = 0x0, notify_priority_changes = 0, ms_down_timer = 0, sands = {tv_sec = 0, tv_usec = 0}, send_buffer = 0x559c3911b750 "E\300", send_buffer_size = 40, ipv4_csum = 0, auth_type = 0 '\000', auth_data = "\000\000\000\000\000\000\000", ipsecah_counter = {cycle = false, seq_number = 0}, ip_id = 0, rb_vrid = {rb_parent_color = 0, rb_right = 0x0, rb_left = 0x0}, rb_sands = {__rb_parent_color = 0, rb_right = 0x0, rb_left = 0x0}, s_list = {next = 0x559c3911c5b0, prev = 0x559c39124c60}, e_list = {next = 0x559c39125a60, prev = 0x559c39124c70}}

louis-6wind commented 3 years ago

@pqarmitage thank you for your help and your work on the other issues.

I have tested with this small patch.

vrrp.wan, for example, never initialized.

diff --git a/keepalived/vrrp/vrrp_if.c b/keepalived/vrrp/vrrp_if.c
index 5dfa4faa..4c49b7ad 100644
--- a/keepalived/vrrp/vrrp_if.c
+++ b/keepalived/vrrp/vrrp_if.c
@@ -1482,7 +1482,7 @@ setup_interface(vrrp_t *vrrp)
 #endif

        /* Find the sockpool entry. If none, then we open the socket */
-       if (vrrp->sockets->fd_in == -1) {
+       if (vrrp->sockets && (vrrp->sockets->fd_in == -1)) {
                /* If the MTU has changed we may need to recalculate the socket receive buffer size */
                if (global_data->vrrp_rx_bufs_policy & RX_BUFS_POLICY_MTU) {
                        vrrp->sockets->rx_buf_size = 0;
@@ -1502,6 +1502,8 @@ setup_interface(vrrp_t *vrrp)
                        vrrp_thread_add_read(vrrp);
                }
        }
+       if (!vrrp->sockets)
+               log_message(LOG_INFO, "NULL vrrp socket on %s", vrrp->iname);

        return;
 }

Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: Registering Kernel netlink reflector Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: Registering Kernel netlink command channel Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: "vrrp_track_file" is deprecated, please use "track_file" Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: Configuration specifies interface vlan1 which doesn't currently exist - will use if created Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: Configuration specifies interface wan which doesn't currently exist - will use if created Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: Delaying startup for 10 seconds Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp.8): Success creating VMAC interface vrrp.8 Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: (switch-lag) MAC Address changed from 0c:23:ac:96:67:01 to da:fd:4f:60:13:4a Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: (switch-lag) MAC Address changed from da:fd:4f:60:13:4a to 0c:23:ac:96:67:01 Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: Interface wan added Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp.wan) interface wan is down Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: (toswitch2) MAC Address changed from 0c:23:ac:96:67:02 to 0c:23:ac:96:67:01 Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: Assigned address 192.168.20.2 for interface ha Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: Assigned address fe80::e23:acff:fe96:6703 for interface ha Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp-ha): Success creating VMAC interface vrrp-ha Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp.8): entering FAULT state (interface vlan8 down) Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp.1): entering FAULT state (interface vlan1 down) Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp.wan): entering FAULT state (interface wan down) Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp.wan): entering FAULT state (interface wan down) Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp.1): entering FAULT state (interface vrrp.1 down) Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp.wan): entering FAULT state (interface vrrp.wan down) Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp.8): entering FAULT state (interface vrrp.8 down) Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp.1) entering FAULT state (no IPv4 address for interface) Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: VRRP_Group(mygroup): Syncing vrrp.1 to FAULT state Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp.wan) entering FAULT state (no IPv4 address for interface) Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp.8) entering FAULT state (no IPv4 address for interface) Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp.1) entering FAULT state Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp.wan) entering FAULT state Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp.8) entering FAULT state Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: Registering gratuitous ARP shared channel Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp.wan): Success creating VMAC interface vrrp.wan Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: NULL vrrp socket on vrrp.wan Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: Interface vlan1 added Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp.1) interface vlan1 is down Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp.1): Success creating VMAC interface vrrp.1 Dec 07 12:14:29 ipsecgw-master-patch Keepalived_vrrp[1529]: NULL vrrp socket on vrrp.1 Dec 07 12:14:30 ipsecgw-master-patch Keepalived_vrrp[1529]: Assigned address 20.20.20.252 for interface wan Dec 07 12:14:30 ipsecgw-master-patch Keepalived_vrrp[1529]: Assigned address 10.10.1.252 for interface vlan1 Dec 07 12:14:30 ipsecgw-master-patch Keepalived_vrrp[1529]: Assigned address 10.10.8.252 for interface vlan8 Dec 07 12:14:31 ipsecgw-master-patch Keepalived_vrrp[1529]: Netlink reports wan up Dec 07 12:14:31 ipsecgw-master-patch Keepalived_vrrp[1529]: Netlink reports vrrp.wan up Dec 07 12:14:31 ipsecgw-master-patch Keepalived_vrrp[1529]: Netlink reports vlan8 up Dec 07 12:14:31 ipsecgw-master-patch Keepalived_vrrp[1529]: Netlink reports vrrp.8 up Dec 07 12:14:31 ipsecgw-master-patch Keepalived_vrrp[1529]: Netlink reports vlan1 up Dec 07 12:14:31 ipsecgw-master-patch Keepalived_vrrp[1529]: Netlink reports vrrp.1 up Dec 07 12:14:33 ipsecgw-master-patch Keepalived_vrrp[1529]: Assigned address fe80::d8fd:4fff:fe60:134a for interface vlan8 Dec 07 12:14:33 ipsecgw-master-patch Keepalived_vrrp[1529]: Assigned address fe80::d8fd:4fff:fe60:134a for interface wan Dec 07 12:14:33 ipsecgw-master-patch Keepalived_vrrp[1529]: Assigned address fe80::d8fd:4fff:fe60:134a for interface vlan1 Dec 07 12:14:39 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp-ha) Entering BACKUP STATE (init) Dec 07 12:14:39 ipsecgw-master-patch Keepalived_vrrp[1529]: VRRP sockpool: [ifindex( 18), family(IPv4), proto(112), fd(12,13)] Dec 07 12:14:39 ipsecgw-master-patch Keepalived_vrrp[1529]: VRRP sockpool: [ifindex( 16), family(IPv4), proto(112), fd(14,15)] Dec 07 12:14:39 ipsecgw-master-patch Keepalived_vrrp[1529]: VRRP sockpool: [ifindex( 14), family(IPv4), proto(112), fd(16,17)] Dec 07 12:14:39 ipsecgw-master-patch Keepalived_vrrp[1529]: VRRP sockpool: [ifindex( 15), family(IPv4), proto(112), fd(18,19)] Dec 07 12:14:40 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp-ha) start preempt delay (60.000000) Dec 07 12:15:40 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp-ha) received lower priority (150) advert from 192.168.20.3 - discarding Dec 07 12:15:41 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp-ha) received lower priority (150) advert from 192.168.20.3 - discarding Dec 07 12:15:42 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp-ha) received lower priority (150) advert from 192.168.20.3 - discarding Dec 07 12:15:42 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp-ha) Receive advertisement timeout Dec 07 12:15:42 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp-ha) Entering MASTER STATE Dec 07 12:15:42 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp-ha) setting VIPs. Dec 07 12:15:42 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp-ha) Sending/queueing gratuitous ARPs on vrrp-ha for 192.168.20.1 Dec 07 12:15:42 ipsecgw-master-patch Keepalived_vrrp[1529]: Sending gratuitous ARP on vrrp-ha for 192.168.20.1 Dec 07 12:15:42 ipsecgw-master-patch Keepalived_vrrp[1529]: Sending gratuitous ARP on vrrp-ha for 192.168.20.1 Dec 07 12:15:42 ipsecgw-master-patch Keepalived_vrrp[1529]: Sending gratuitous ARP on vrrp-ha for 192.168.20.1 Dec 07 12:15:42 ipsecgw-master-patch Keepalived_vrrp[1529]: Sending gratuitous ARP on vrrp-ha for 192.168.20.1 Dec 07 12:15:42 ipsecgw-master-patch Keepalived_vrrp[1529]: Sending gratuitous ARP on vrrp-ha for 192.168.20.1 Dec 07 12:15:47 ipsecgw-master-patch Keepalived_vrrp[1529]: (vrrp-ha) Sending/queueing gratuitous ARPs on vrrp-ha for 192.168.20.1 Dec 07 12:15:47 ipsecgw-master-patch Keepalived_vrrp[1529]: Sending gratuitous ARP on vrrp-ha for 192.168.20.1 Dec 07 12:15:47 ipsecgw-master-patch Keepalived_vrrp[1529]: Sending gratuitous ARP on vrrp-ha for 192.168.20.1 Dec 07 12:15:47 ipsecgw-master-patch Keepalived_vrrp[1529]: Sending gratuitous ARP on vrrp-ha for 192.168.20.1 Dec 07 12:15:47 ipsecgw-master-patch Keepalived_vrrp[1529]: Sending gratuitous ARP on vrrp-ha for 192.168.20.1 Dec 07 12:15:47 ipsecgw-master-patch Keepalived_vrrp[1529]: Sending gratuitous ARP on vrrp-ha for 192.168.20.1

pqarmitage commented 3 years ago

I have had a rethink of how vrrp_startup_delay works, since there could be yet other areas in the code where there may be segfaults.

Rather than trying to add code to specifically handle the sockets not having been created, the code now creates the sockets at startup as normal, but extends the initial timeout for not receiving an advert by the vrrp_startup_delay. This very much simplifies the code for handling vrrp_startup_delay; I have reverted the previous commit (1d92217), added this commit, and the segfaults you (@louis-oui) previously identified, and this segfault are all resolved by this new approach.

louis-6wind commented 3 years ago

Thank you @pqarmitage. No segfault anymore