acassen / keepalived

Keepalived
https://www.keepalived.org
GNU General Public License v2.0
4.03k stars 734 forks source link

keepalived from snap fails to assign address only at boot (failed 99) #2514

Open zbugrkx opened 6 hours ago

zbugrkx commented 6 hours ago

Describe the bug

To Reproduce

Expected behavior I would expect the vrrp instance to move to backup/master state at boot and not require a service restart

Keepalived version

Output of `keepalived -v`
```Keepalived v2.3.2 (11/03,2024)

Copyright(C) 2001-2024 Alexandre Cassen, <acassen@gmail.com>

Built with kernel headers for Linux 5.8.18
Running on Linux 6.6.62+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.62-1+rpt1 (2024-11-25)
Distro: Debian GNU/Linux 12 (bookworm)

configure options: --prefix=/usr --with-samples-dir=$(docdir) --enable-bfd --enable-dbus --enable-json --enable-regex --enable-snmp --enable-snmp-rfc --disable-libipset-dynamic LDFLAGS=-L/build/keepalived/stage/lib -L/build/keepalived/stage/usr/lib -L/build/keepalived/stage/lib/aarch64-linux-gnu -L/build/keepalived/stage/usr/lib/aarch64-linux-gnu

Config options:  LIBIPSET NFTABLES LVS REGEX VRRP VRRP_AUTH VRRP_VMAC JSON BFD OLD_CHKSUM_COMPAT SNMP_V3_FOR_V2 SNMP_VRRP SNMP_CHECKER SNMP_RFCV2 SNMP_RFCV3 DBUS IPROUTE_ETC_DIR=/etc/iproute2 IPROUTE_USR_DIR=/usr/share/iproute2 INIT=systemd

System options:  VSYSLOG MEMFD_CREATE IPV6_MULTICAST_ALL LIBKMOD IPV4_DEVCONF LIBNL3 RTA_ENCAP RTA_EXPIRES RTA_NEWDST RTA_PREF FRA_SUPPRESS_PREFIXLEN FRA_SUPPRESS_IFGROUP FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK RTEXT_FILTER_SKIP_STATS FRA_L3MDEV FRA_UID_RANGE RTAX_FASTOPEN_NO_COOKIE RTA_VIA FRA_PROTOCOL FRA_IP_PROTO FRA_SPORT_RANGE FRA_DPORT_RANGE RTA_TTL_PROPAGATE IFA_FLAGS LWTUNNEL_ENCAP_MPLS LWTUNNEL_ENCAP_ILA IPTABLES NET_LINUX_IF_H_COLLISION LIBIPVS_NETLINK IPVS_DEST_ATTR_ADDR_FAMILY IPVS_SYNCD_ATTRIBUTES IPVS_64BIT_STATS IPVS_TUN_TYPE IPVS_TUN_CSUM IPVS_TUN_GRE VRRP_IPVLAN IFLA_LINK_NETNSID GLOB_BRACE GLOB_ALTDIRFUNC INET6_ADDR_GEN_MODE VRF SO_MARK

**Distro (please complete the following information):**
 - Name: Raspbian Bookworm
 - Version: 12 - Linux 6.6.62+rpt-rpi-v8
 - Architecture: arm64

**Configuration file:**

global_defs { router_id vrrphome vrrp_startup_delay 10 script_user root enable_script_security max_auto_priority 99 vrrp_rt_priority 99 checker_rt_priority 99 vrrp_no_swap checker_no_swap vrrp_rlimit_rttime 100000 } vrrp_sync_group vrrpV4V6 { group { vrrp4 vrrp6 } } vrrp_instance vrrp4 { state BACKUP interface eth0 virtual_router_id 53 use_vmac vmac_xmit_base accept priority 100 advert_int 1 unicast_src_ip 1.... unicast_peer { 1.... }

authentication { auth_type PASS auth_pass secret }

virtual_ipaddress { 1.../32 1..../32 } } vrrp_instance vrrp6 { state BACKUP interface eth0 virtual_router_id 53 use_vmac vmac_xmit_base accept priority 100 advert_int 1 unicast_src_ip fe80.... unicast_peer { fe80.... } virtual_ipaddress { fe80...../128 fe80...../128 2....../128 2....../128 } }


**Notify and track scripts**

If any notify or track scripts are in use, please provide copies of them


**System Log entries**

Dec 04 22:21:27 raspi systemd[1]: Starting snap.keepalived.daemon.service - Service for snap application keepalived.daemon... Dec 04 22:21:29 raspi Keepalived[938]: Starting Keepalived v2.3.2 (11/03,2024) Dec 04 22:21:29 raspi Keepalived[938]: Running on Linux 6.6.62+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.62-1+rpt1 (2024-11-25) (built for Linux 5.8.18) Dec 04 22:21:29 raspi Keepalived[938]: Command line: '/snap/keepalived/2877/usr/sbin/keepalived-508' Dec 04 22:21:29 raspi Keepalived[938]: Configuration file /usr/etc/keepalived/keepalived.conf Dec 04 22:21:29 raspi systemd[1]: Started snap.keepalived.daemon.service - Service for snap application keepalived.daemon. Dec 04 22:21:29 raspi Keepalived[1108]: Starting VRRP child process, pid=1109 Dec 04 22:21:29 raspi Keepalived_vrrp[1109]: Delaying startup for 10 seconds Dec 04 22:21:29 raspi Keepalived_vrrp[1109]: vrrp_ipsets has been specified but not vrrp_iptables - vrrp_ipsets will be ignored Dec 04 22:21:29 raspi Keepalived_vrrp[1109]: Unable to load magic database Dec 04 22:21:29 raspi Keepalived_vrrp[1109]: use_vmac or no_accept/strict specified, but no firewall configured - using nftables Dec 04 22:21:29 raspi Keepalived_vrrp[1109]: (vrrp4): entering FAULT state (interface eth0 down) Dec 04 22:21:29 raspi Keepalived_vrrp[1109]: (vrrp6): entering FAULT state (interface eth0 down) Dec 04 22:21:29 raspi Keepalived_vrrp[1109]: (vrrp4): entering FAULT state (interface vrrp.53 down) Dec 04 22:21:29 raspi Keepalived_vrrp[1109]: (vrrp6): entering FAULT state (interface vrrp6.53 down) Dec 04 22:21:29 raspi Keepalived_vrrp[1109]: VRRP_Group(vrrpV4V6): Syncing vrrp4 to FAULT state Dec 04 22:21:29 raspi Keepalived_vrrp[1109]: (vrrp4) entering FAULT state Dec 04 22:21:29 raspi Keepalived_vrrp[1109]: (vrrp6) entering FAULT state Dec 04 22:21:29 raspi Keepalived[1108]: Startup complete Dec 04 22:21:29 raspi Keepalived_vrrp[1109]: bind unicast_src x.x.x.x.x failed 99 - Cannot assign requested address Dec 04 22:21:29 raspi Keepalived_vrrp[1109]: bind unicast_src x.x.x.x.x failed 99 - Cannot assign requested address Dec 04 22:21:29 raspi Keepalived_vrrp[1109]: Script check_failoverv4 now returning 1 Dec 04 22:21:29 raspi Keepalived_vrrp[1109]: VRRP_Script(check_failoverv4) failed (exited with status 1) Dec 04 22:21:29 raspi Keepalived_vrrp[1109]: Script check_failoverv6 now returning 1 Dec 04 22:21:29 raspi Keepalived_vrrp[1109]: VRRP_Script(check_failoverv6) failed (exited with status 1) Dec 04 22:21:31 raspi Keepalived_vrrp[1109]: Netlink reports eth0 up Dec 04 22:21:31 raspi Keepalived_vrrp[1109]: Netlink reports vrrp.53 up Dec 04 22:21:31 raspi Keepalived_vrrp[1109]: Netlink reports vrrp6.53 up Dec 04 22:21:33 raspi Keepalived_vrrp[1109]: Track script check_failoverv6 is already running, expect idle - skipping run Dec 04 22:21:39 raspi Keepalived_vrrp[1109]: Delayed start completed Dec 04 22:22:14 raspi Keepalived_vrrp[1109]: Script check_failoverv4 now returning 0 Dec 04 22:22:14 raspi Keepalived_vrrp[1109]: Script check_failoverv6 now returning 0 Dec 04 22:22:15 raspi Keepalived_vrrp[1109]: VRRP_Script(check_failoverv4) succeeded Dec 04 22:22:15 raspi Keepalived_vrrp[1109]: VRRP_Script(check_failoverv6) succeeded

pqarmitage commented 5 hours ago

From the log entries above the network interfaces are down, and network addresses have not been assigned to the interfaces.

You need to delay the startup of the snap so that keepalived doesn't start running until the network interfaces are up and configured.

I am currently working on a patch that will handle unicast_src_ip addresses not being available when keepalived starts, but that is not ready yet, and it is rather more complicated than I anticipated.

Rather than using a snap, it would probably be simpler to build the current version of keepalived on a Raspberry Pi. I do so and it works without any problems.

zbugrkx commented 5 hours ago

From the log entries above the network interfaces are down, and network addresses have not been assigned to the interfaces.

You need to delay the startup of the snap so that keepalived doesn't start running until the network interfaces are up and configured.

I am currently working on a patch that will handle unicast_src_ip addresses not being available when keepalived starts, but that is not ready yet, and it is rather more complicated than I anticipated.

Rather than using a snap, it would probably be simpler to build the current version of keepalived on a Raspberry Pi. I do so and it works without any problems.

Thank you for the response!

I have tried to delay using vrrp_startup_delay 60 but even that was still failing. And my two pi's, being pi5 with not much on them start quite fast, in fact even with 30s delay, i'm back onto ssh (so the IP and interface is up) that keepalived is not started yet. This is where i'm confused. Unless you refer to some other snap service that I need to delay and not keepalived itself?

As for buliding it. I will say that i'm not "that" skilled and being able to install via a store makes it a lot more simple. I tried to read the install file on git and between the ton of requirements and configuration parameters to send etc, I got a bit overwhelmed..

But if building it would avoid problems, I can give it a try. First question would be, from the install instructions, can I just follow them line by line and i'll have a working keepalived with all needed options (importantly, ipv6) or will I need to change some stuff?

Thanks!

pqarmitage commented 4 hours ago

There isn't a problem with keepalived starting when the network interfaces are down. The problem is the error messages:

Dec 04 22:21:29 raspi Keepalived_vrrp[1109]: bind unicast_src x.x.x.x.x failed 99 - Cannot assign requested address
Dec 04 22:21:29 raspi Keepalived_vrrp[1109]: bind unicast_src x.x.x.x.x failed 99 - Cannot assign requested address

I do have a very simple patch to resolve this particular issue, but to resolve the whole problem around unicast src ip addresses being added and deleted is rather more complicated, and that is what I am currently working on.

In the morning I will consider whether I can just push the simple patch in the hope that it should resolve your problem.

I would be interested in how you are implementing the 60 or 30 second delay to starting keepalived, and also to see the keepalived logs that are produced when you do delay keepalived starting.

If seems to me that there is a problem with the systemd service file handling. keepalived is configured with After=network-online.service and yet it is running before the network service is fully up.

A desciption of how systemd handles network-online.service is at https://systemd.io/NETWORK_ONLINE/ and I will look at that further in the morning. I note one suggestion is to use IP_FREEBIND, and that is what my simple patch referred to above does. Once I have applied the patch and the snap has rebuilt, you would need to use the latest/edge snap, since we don't promote intermediate patch releases to the beta or more stable versions.

It does seem to me that the network-online.target or NetworkManager.service are not working properly, because keepalived is being allow to start up before network-online.target is reached, unless of course those logs are from using the snap. It would be really helpful if you could post the keepalived log output from when you are not running keepalived in a snap and see what happens then.

If you feel it is best to build keepalived yourself I can provide detailed instructions regarding what is needed for a Raspberry Pi running Bookworm tomorrow.