acassen / keepalived

Keepalived
https://www.keepalived.org
GNU General Public License v2.0
3.95k stars 736 forks source link

Keepalived_healthcheckers - epoll_wait error: Invalid argument #1809

Closed derrickhayashi closed 3 years ago

derrickhayashi commented 3 years ago

Describe the bug keepalived appears to stop communicating with ipvs after Keepalived_healthcheckers encounters a epoll_wait error. When adding a new virtual server, and keepalived is reloaded, although it appears to read the config file, the changes are not applied to ipvs (checked by looking at ipvsadm output vs what is in the config file). The healthchecker either appears to have stopped working or stopped writing to logs.

Dec 06 13:56:16 ipvs101 Keepalived_healthcheckers[6793]: scheduler: epoll_wait error: Invalid argument
Dec 09 12:01:35 ipvs101 systemd[1]: Reloading Keepalive Daemon (LVS).
Dec 09 12:01:35 ipvs101 Keepalived[6790]: Reloading ...
Dec 09 12:01:35 ipvs101 Keepalived[6790]: Opening file '/etc/keepalived/keepalived.conf'.
Dec 09 12:01:35 ipvs101 systemd[1]: Reloaded Keepalive Daemon (LVS).
Dec 09 12:01:35 ipvs101 Keepalived[6790]: Opening file '/etc/keepalived/configs/keepalived.conf'.
Dec 09 12:01:35 ipvs101 Keepalived[6790]: Opening file '/etc/keepalived/configs/v2-keepalived.conf'.

To Reproduce Unclear. This was resolved by restarting the keepalived process, and it hasn't occurred again since. This happened to multiple servers within a span of time. They have the same configuration, but are otherwise not related to each other in the sense that no state sharing is done by either ipvs (no sync) or keepalived (no vrrp).

However, this has not happen to other servers running keepalived with similar configuration, but different virtual and real servers.

Expected behavior

Keepalived version

Keepalived v2.0.20 (unknown)

Copyright(C) 2001-2020 Alexandre Cassen, <acassen@gmail.com>

Built with kernel headers for Linux 4.15.18
Running on Linux 5.4.0-54-generic #60~18.04.1-Ubuntu SMP Fri Nov 6 17:25:16 UTC 2020

configure options: --prefix=/

Config options:  LIBIPTC LIBIPSET_DYNAMIC NFTABLES LVS VRRP VRRP_AUTH OLD_CHKSUM_COMPAT FIB_ROUTING

System options:  PIPE2 SIGNALFD INOTIFY_INIT1 VSYSLOG EPOLL_CREATE1 IPV4_DEVCONF IPV6_ADVANCED_API LIBNL3 RTA_ENCAP RTA_EXPIRES RTA_NEWDST RTA_PREF FRA_SUPPRESS_PREFIXLEN FRA_SUPPRESS_IFGROUP FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK RTEXT_FILTER_SKIP_STATS FRA_L3MDEV FRA_UID_RANGE RTAX_FASTOPEN_NO_COOKIE RTA_VIA FRA_OIFNAME RTA_TTL_PROPAGATE IFA_FLAGS IP_MULTICAST_ALL LWTUNNEL_ENCAP_MPLS LWTUNNEL_ENCAP_ILA LIBIPTC LIBIPSET_PRE_V7 NET_LINUX_IF_H_COLLISION LIBIPVS_NETLINK IPVS_DEST_ATTR_ADDR_FAMILY IPVS_SYNCD_ATTRIBUTES IPVS_64BIT_STATS VRRP_VMAC VRRP_IPVLAN IFLA_LINK_NETNSID CN_PROC SOCK_NONBLOCK SOCK_CLOEXEC O_PATH GLOB_BRACE INET6_ADDR_GEN_MODE VRF SO_MARK SCHED_RESET_ON_FORK

Distro (please complete the following information):

Details of any containerisation or hosted service (e.g. AWS) None, it runs on on prem bare metal servers directly.

Configuration file: The configuration file generally follows this format.

virtual_server 192.0.2.1 80 {
        delay_loop 3
        lvs_sched mh
        mh-port
        mh-fallback
        lvs_method TUN
        protocol TCP
        real_server 198.51.100.1 80 {
                weight 10
                HTTP_GET {
                        url {
                                path "/ping"
                                status_code 200
                        }
                        connect_ip 198.51.100.1
                        connect_port 9001
                        connect_timeout 2
                }
        }
        real_server 198.51.100.2 80 {
                weight 10
                HTTP_GET {
                        url {
                                path "/ping"
                                status_code 200
                        }
                        connect_ip 198.51.100.2
                        connect_port 9001
                        connect_timeout 2
                }
        }
        [... and more ...]
}
[... and more ...]

Notify and track scripts None

System Log entries

Dec 06 13:56:16 ipvs101 Keepalived_healthcheckers[6793]: scheduler: epoll_wait error: Invalid argument

Subsequently, when reloaded, this output is shown.

Dec 06 13:56:16 ipvs101 Keepalived_healthcheckers[6793]: scheduler: epoll_wait error: Invalid argument
Dec 09 12:01:35 ipvs101 systemd[1]: Reloading Keepalive Daemon (LVS).
Dec 09 12:01:35 ipvs101 Keepalived[6790]: Reloading ...
Dec 09 12:01:35 ipvs101 Keepalived[6790]: Opening file '/etc/keepalived/keepalived.conf'.
Dec 09 12:01:35 ipvs101 systemd[1]: Reloaded Keepalive Daemon (LVS).
Dec 09 12:01:35 ipvs101 Keepalived[6790]: Opening file '/etc/keepalived/configs/keepalived.conf'.
Dec 09 12:01:35 ipvs101 Keepalived[6790]: Opening file '/etc/keepalived/configs/v2-keepalived.conf'.
Dec 10 09:32:36 ipvs101 systemd[1]: Reloading Keepalive Daemon (LVS).
Dec 10 09:32:36 ipvs101 Keepalived[6790]: Reloading ...
Dec 10 09:32:36 ipvs101 Keepalived[6790]: Opening file '/etc/keepalived/keepalived.conf'.
Dec 10 09:32:36 ipvs101 systemd[1]: Reloaded Keepalive Daemon (LVS).
Dec 10 09:32:36 ipvs101 Keepalived[6790]: Opening file '/etc/keepalived/configs/keepalived.conf'.
Dec 10 09:32:36 ipvs101 Keepalived[6790]: Opening file '/etc/keepalived/configs/v2-keepalived.conf'.

Under normal circumstances, when reloaded, the output shows ipvs being reinitalized as well as the healthchecker being activated.

Dec 11 06:48:56 ipvs101 systemd[1]: Reloading Keepalive Daemon (LVS).
Dec 11 06:48:56 ipvs101 Keepalived[11617]: Reloading ...
Dec 11 06:48:56 ipvs101 Keepalived[11617]: Opening file '/etc/keepalived/keepalived.conf'.
Dec 11 06:48:56 ipvs101 systemd[1]: Reloaded Keepalive Daemon (LVS).
Dec 11 06:48:56 ipvs101 Keepalived[11617]: Opening file '/etc/keepalived/configs/keepalived.conf'.
Dec 11 06:48:56 ipvs101 Keepalived[11617]: Opening file '/etc/keepalived/configs/v2-keepalived.conf'.
Dec 11 06:48:56 ipvs101 Keepalived_healthcheckers[11618]: Reloading
Dec 11 06:48:56 ipvs101 Keepalived_healthcheckers[11618]: Got SIGHUP, reloading checker configuration
Dec 11 06:48:56 ipvs101 Keepalived_healthcheckers[11618]: Opening file '/etc/keepalived/keepalived.conf'.
Dec 11 06:48:56 ipvs101 Keepalived_healthcheckers[11618]: Opening file '/etc/keepalived/configs/keepalived.conf'.
Dec 11 06:48:56 ipvs101 Keepalived_healthcheckers[11618]: Opening file '/etc/keepalived/configs/v2-keepalived.conf'.
Dec 11 06:48:56 ipvs101 Keepalived_healthcheckers[11618]: Reinitializing ipvs
Dec 11 06:48:56 ipvs101 Keepalived_healthcheckers[11618]: Activating healthchecker for service [198.51.100.1]:tcp:80 for VS [192.0.2.1]:tcp:80
Dec 11 06:48:56 ipvs101 Keepalived_healthcheckers[11618]: Activating healthchecker for service [198.51.100.2]:tcp:80 for VS [192.0.2.1]:tcp:80

Did keepalived coredump? No

Additional context The kernel was upgraded to 5.4.0 via Ubuntu HWE, and I'm wondering if keepalived needs to be recompiled after that? I noticed the following in the keepalived -v output and wonder if it is significant to this issue in some way.

Built with kernel headers for Linux 4.15.18
Running on Linux 5.4.0-54-generic #60~18.04.1-Ubuntu SMP Fri Nov 6 17:25:16 UTC 2020
derrickhayashi commented 3 years ago

Update to the comment I wrote in Additional context. I found the same issue happened on a server without the HWE upgrade, so I don't think that is the cause of it.

keepalived -v on this other server:

Keepalived v2.0.20 (unknown)

Copyright(C) 2001-2020 Alexandre Cassen, <acassen@gmail.com>

Built with kernel headers for Linux 4.15.18
Running on Linux 4.15.0-42-generic #45-Ubuntu SMP Thu Nov 15 19:32:57 UTC 2018

configure options: --prefix=/

Config options:  LIBIPTC LIBIPSET_DYNAMIC NFTABLES LVS VRRP VRRP_AUTH OLD_CHKSUM_COMPAT FIB_ROUTING

System options:  PIPE2 SIGNALFD INOTIFY_INIT1 VSYSLOG EPOLL_CREATE1 IPV4_DEVCONF IPV6_ADVANCED_API LIBNL3 RTA_ENCAP RTA_EXPIRES RTA_NEWDST RTA_PREF FRA_SUPPRESS_PREFIXLEN FRA_SUPPRESS_IFGROUP FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK RTEXT_FILTER_SKIP_STATS FRA_L3MDEV FRA_UID_RANGE RTAX_FASTOPEN_NO_COOKIE RTA_VIA FRA_OIFNAME RTA_TTL_PROPAGATE IFA_FLAGS IP_MULTICAST_ALL LWTUNNEL_ENCAP_MPLS LWTUNNEL_ENCAP_ILA LIBIPTC LIBIPSET_PRE_V7 NET_LINUX_IF_H_COLLISION LIBIPVS_NETLINK IPVS_DEST_ATTR_ADDR_FAMILY IPVS_SYNCD_ATTRIBUTES IPVS_64BIT_STATS VRRP_VMAC VRRP_IPVLAN IFLA_LINK_NETNSID CN_PROC SOCK_NONBLOCK SOCK_CLOEXEC O_PATH GLOB_BRACE INET6_ADDR_GEN_MODE VRF SO_MARK SCHED_RESET_ON_FORK
pqarmitage commented 3 years ago

I have never seen this before and cannot see an obvious reason why it might happen.

The epoll_wait error was logged about 3 days before the reload. What were the log entries from the healthchecker process before it logged the epoll_wait error?

Since you state that all the servers this occurred on had the same configuration, and that none of the servers running with a different configuration suffered the same problem, it sounds as though this relates to the specific configuration. I think we will need to see the full configuration that keepalived was running with when the problem occurred, and then we can try it ourselves to see if we can reproduce the problem. epoll_wait returns EINVAL if the file descriptor passed to it is not a valid epoll file descriptor. This makes me think that somehow there has been some memory corruption in keepalived.

Following the epoll_wait error, and assuming that keepalived continues to get the same error every time it calls epoll_wait (it only logs the first error to avoid filling up log files), keepalived will just be sitting in a loop calling epoll_wait and sleeping for 1 second. The healthchecker process will therefore not respond to the signal from its parent process to reload, since it receives the signal via a signalfd, and that is notified via the epoll_wait call. I can certainly add some code to make the keepalived process exit if the epoll_wait error is persistent, and it will then be reloaded by the parent process; this would provide some form of recovery. However, I would like to attempt to identify what triggered the epoll_wait error in the first place, and so I think seeing the full configurations is going to be necessary.

derrickhayashi commented 3 years ago

Hello, thanks for looking into this and explaining how keepalived uses epoll_wait.

There was nothing unusual before it logged the epoll_wait error, just the logs from another previous reload. The log includes the previous successful reload, followed the epoll_error, subsequent reload failures, and then on Dec 11 05:09:54 (line 1051), I intervened and restarted keepalived, which resolved the issue. The full config can be found attached here ipvs101-log.txt.

I have a simple agent on the server that just fetches the config and reloads keepalived when there is a change. I'm not able to get the exact configuration at the time when this occurred since it has been overwritten. However, all configurations follows a template, so the only difference between then and now would have been the number of virtual servers and real servers configured. Likewise, the only difference between servers that this occurred on and those that didn't would be the same. I'll attach the current config and hope it helps.

keepalived-conf.txt configs-v2-keepalived-conf.txt configs-keepalived-conf.txt

Please let me know if there is anything else I can provide you with.

pqarmitage commented 3 years ago

I think since this problem has only been seen with 1 specific configuration, and that it has not occurred with similar configurations (if I have understood the situation correctly), I don't think we are going to be able to resolve the underlying issue with the configuration not being available (I think it would have been difficult to find the problem even if the configuration had been available, since keepalived had been running for 6 1/2 days since the previous reload without the error occurring).

Commit 9c4c93f changes what keepalived does if epoll_wait() repeatedly returns an unrecoverable error. Previously the code sat in a loop calling epoll_wait(), getting the EINVAL error, and then sleeping for 1 second.

This commit now will try epoll_wait() 5 times, and if it gets the same error the process (in this case the healthchecker process) will clean up and exit, and will then be restarted by its parent process. This isn't perfect, of course, since we should resolve the cause of EINVAL being returned, but at least it will ensure that the healthchecker process will not sit there for three days doing nothing.

derrickhayashi commented 3 years ago

Thank you very much for the fix.

We'll add monitoring/alerting for the error, so if this does happen again, we can get the configuration at time of occurrence instead of only realizing a few days later after that config was already overwritten. Perhaps then it will be possible to find the real cause why EINVAL was returned.

pqarmitage commented 3 years ago

I think we will close this issue for now, since there is nothing more we can do at the moment. If the problem recurs we can reopen this issue with the further information.

derrickhayashi commented 3 years ago

Hello @pqarmitage, this issue has occurred a number of times since, and I'm able to understand when this is being triggered. It's been happening across different servers, with varied configurations. However, this problem occurs often on a weekend when there are no configuration changes taking place, and hence, no keepalived reloads. On workdays, virtual servers are often being added and removed, and keepalived is reloaded each time when that happens. I suspect the regular reloads are hiding the issue.

In conclusion, this issue happens after sometime when keepalived is left alone.

I was able to reproduce this by just copying the configs from a production server to another spare and checking it on it a few days later. During this period, the server is left untouched and also doesn't forward any traffic.

Jan 28 13:00:17 ipvs-host Keepalived_healthcheckers[7372]: Activating healthchecker for service [100.99.80.14]:tcp:80 for VS [100.99.126.109]:tcp:80
Jan 28 13:00:17 ipvs-host Keepalived_healthcheckers[7372]: Activating healthchecker for service [100.99.80.2]:tcp:80 for VS [100.99.126.109]:tcp:80
Jan 31 01:53:33 ipvs-host systemd-journald[568]: Suppressed 4110 messages from keepalived.service
Jan 31 01:53:33 ipvs-host Keepalived_healthcheckers[7372]: scheduler: epoll_wait error: Invalid argument

Once again, I've included the configuration files, and hopefully the issue can replicated by someone else too.

0202-configs-keepalived-conf.txt 0202-configs-v2-keepalived-conf.txt 0202-keepalived-conf.txt

I'm wondering what the cause is? A bad library? Are the configs just too long?

pqarmitage commented 3 years ago

@sheepbrew I doubt that a bad library is the cause of the problem, and the configs aren't too long - we have users with considerably larger configs, for example over 1500 virtual servers and 100,000 real servers.

It would be helpful to know if you have merge commit 9c4c93f into the keepalived code you are running.

The log entry: Jan 31 01:53:33 ipvs-host systemd-journald[568]: Suppressed 4110 messages from keepalived.service is interesting. My experience of syslog is that a number of messages are logged before the 'Suppressed nnn messages' is logged, so that one knows the type of message being suppressed. If 4110 messages were all sent at 01:53:33 then that would seem to indicate something very strange happening.

This is now sounding very similar to issue #1756, with a large number of HTTP_GET checkers (I think approx 2450 in your configuration each being run every 3 seconds) and reloads being done. Issue #1756 resulted in a number of patches one of which involved a file descriptor leak when using HTTP_GET checkers and reloading. You are almost certainly experiencing this problem.

In order to incorporate those commits it would be sensible to upgrade to keepalived v2.2.1 rather than attempting to backport them into v2.1.5, especially since there were one or two subsequent patches. I can't say for certain that this will resolve your problem, but it will eliminate some known problems that occur with v2.1.5 in situations similar to yours.

If you could try this with v2.2.1 and report back, then we will know how to proceed. One thing that would be interesting would be to run the following script

#!/bin/bash

PID_DIR=${1:-/run}

for p in keepalived checkers vrrp bfd; do
    [[ ! -f ${PID_DIR}/$p.pid ]] && continue;

    echo; echo $p
    lsof -p $(cat ${PID_DIR}/$p.pid) 2>/dev/null | grep -v " mem "
done

and capture the output. I am particularly interested if the number of file descriptors in the CLOSE_WAIT state increases over time as reloads are done (both with your current v2.1.5 and with v2.2.1).

derrickhayashi commented 3 years ago

@pqarmitage I haven't incorporated the patch yet. I'll try v2.2.1 and let you know how it works out.

My current version is actually v2.0.20, and I ran your script on a number of existing servers. Generally, it seems there are sockets stuck in the CLOSE_WAIT state but not very many (between 0 ~ 5). The following output is from one of the servers. Keepalived has been running on this server for some time and is frequently reloaded, and did not encounter the epoll error yet.

I then tried reloading keepalived a couple of times but the number of CLOSE_WAIT sockets did not increase.

keepalived
COMMAND     PID USER   FD      TYPE             DEVICE SIZE/OFF       NODE NAME
keepalive 17505 root  cwd       DIR                8,1     4096          2 /
keepalive 17505 root  rtd       DIR                8,1     4096          2 /
keepalive 17505 root  txt       REG                8,1  2599872      64292 /sbin/keepalived
keepalive 17505 root  DEL       REG                8,1                2084 /lib/x86_64-linux-gnu/libm-2.27.so
keepalive 17505 root  DEL       REG                8,1                2183 /lib/x86_64-linux-gnu/libpthread-2.27.so
keepalive 17505 root  DEL       REG                8,1                2080 /lib/x86_64-linux-gnu/libc-2.27.so
keepalive 17505 root  DEL       REG                8,1                2083 /lib/x86_64-linux-gnu/libdl-2.27.so
keepalive 17505 root  DEL       REG                8,1                2076 /lib/x86_64-linux-gnu/ld-2.27.so
keepalive 17505 root    0u      CHR                1,3      0t0          6 /dev/null
keepalive 17505 root    1u      CHR                1,3      0t0          6 /dev/null
keepalive 17505 root    2u      CHR                1,3      0t0          6 /dev/null
keepalive 17505 root    3u     unix 0xffff9a069f6b0800      0t0 1343625695 type=DGRAM
keepalive 17505 root    4u  netlink                         0t0 1343625697 CONNECTOR
keepalive 17505 root    5u  a_inode               0,14        0      11437 [eventpoll]
keepalive 17505 root    6u  a_inode               0,14        0      11437 [timerfd]
keepalive 17505 root    7u  a_inode               0,14        0      11437 [signalfd]

checkers
COMMAND     PID USER   FD      TYPE             DEVICE SIZE/OFF       NODE NAME
keepalive 17506 root  cwd       DIR                8,1     4096          2 /
keepalive 17506 root  rtd       DIR                8,1     4096          2 /
keepalive 17506 root  txt       REG                8,1  2599872      64292 /sbin/keepalived
keepalive 17506 root  DEL       REG                8,1                2084 /lib/x86_64-linux-gnu/libm-2.27.so
keepalive 17506 root  DEL       REG                8,1                2183 /lib/x86_64-linux-gnu/libpthread-2.27.so
keepalive 17506 root  DEL       REG                8,1                2080 /lib/x86_64-linux-gnu/libc-2.27.so
keepalive 17506 root  DEL       REG                8,1                2083 /lib/x86_64-linux-gnu/libdl-2.27.so
keepalive 17506 root  DEL       REG                8,1                2076 /lib/x86_64-linux-gnu/ld-2.27.so
keepalive 17506 root    0u      CHR                1,3      0t0          6 /dev/null
keepalive 17506 root    1u      CHR                1,3      0t0          6 /dev/null
keepalive 17506 root    2u      CHR                1,3      0t0          6 /dev/null
keepalive 17506 root    3u     unix 0xffff9a069f6b0800      0t0 1343625695 type=DGRAM
keepalive 17506 root    4u  a_inode               0,14        0      11437 [eventpoll]
keepalive 17506 root    5u  a_inode               0,14        0      11437 [timerfd]
keepalive 17506 root    6u  a_inode               0,14        0      11437 [signalfd]
keepalive 17506 root    7u     IPv4         1711923358      0t0        TCP foo-ipvs201:49922->haproxy101:9001 (CLOSE_WAIT)
keepalive 17506 root    9u     IPv4         2382783452      0t0        TCP foo-ipvs201:45778->haproxy102:9001 (CLOSE_WAIT)
keepalive 17506 root   10u     IPv4         2382783449      0t0        TCP foo-ipvs201:45772->haproxy102:9001 (CLOSE_WAIT)
foo-ipvs201:~# date
Mon Feb  8 04:09:33 UTC 2021
root@jpe1z-priv-ipvs201:~# stat /proc/17506/fd/7
  File: /proc/17506/fd/7 -> socket:[1711923358]
  Size: 64          Blocks: 0          IO Block: 1024   symbolic link
Device: 5h/5d   Inode: 2019598262  Links: 1
Access: (0700/lrwx------)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2021-02-07 07:02:47.827701297 +0000
Modify: 2021-01-26 06:46:38.488746230 +0000
Change: 2021-01-26 06:46:38.488746230 +0000
 Birth: -

foo-ipvs201:~# stat /proc/17506/fd/9
  File: /proc/17506/fd/9 -> socket:[2382783452]
  Size: 64          Blocks: 0          IO Block: 1024   symbolic link
Device: 5h/5d   Inode: 2297393638  Links: 1
Access: (0700/lrwx------)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2021-02-07 09:02:55.149691909 +0000
Modify: 2021-02-04 03:16:13.239489702 +0000
Change: 2021-02-04 03:16:13.239489702 +0000
 Birth: -

foo-ipvs201:~# stat /proc/17506/fd/10
  File: /proc/17506/fd/10 -> socket:[2382783449]
  Size: 64          Blocks: 0          IO Block: 1024   symbolic link
Device: 5h/5d   Inode: 2380986465  Links: 1
Access: (0700/lrwx------)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2021-02-07 17:57:27.613671559 +0000
Modify: 2021-02-06 16:31:55.428669844 +0000
Change: 2021-02-06 16:31:55.428669844 +0000
 Birth: -
foo-ipvs201:~# systemctl status keepalived
● keepalived.service - Keepalive Daemon
   Loaded: loaded (/etc/systemd/system/keepalived.service; disabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/keepalived.service.d
           └─ipvsadm-set.conf
   Active: active (running) since Fri 2021-01-01 15:45:59 UTC; 1 months 7 days ago
derrickhayashi commented 3 years ago

Hello, I ran v2.2.1 and v2.0.20 on different servers over the weekend with the same configuration. v2.2.1 ran fine and did not encounter any epoll_wait error, while v2.0.20 encountered the error as expected. v2.2.1 did not have any sockets stuck in CLOSE_WAIT, while v2.0.20 had 6 sockets stuck.

it's clear that this issue does not happen on v2.2.1 and I'll upgrade all our servers to use that instead.

Thank you very much for your help and guidance with this issue.

pqarmitage commented 3 years ago

@sheepbrew Many thanks for your update, and I'm glad your problem is resolved. It is always very helpful to get feedback that a problem has been fixed and the solution (in this case upgrading to v2.2.1).