Add support for TCP Keepalives

I would like to mention that the absence of TCP KEEPALIVE has caused problems in production. Let me describe a real scenario that we had.

envoy runs on a host and has PID 22139

kubernetes-monitor-8001 $ ps aux | grep usr/bin/envoy
nobody    5372  0.0  0.0 125320  6316 ?        Ss   Mar12   0:01 python
/usr/bin/envoy-hot-restarter.py /usr/bin/envoy-start.sh
nobody   22139  0.1  0.0  97932 20148 ?        Sl   Mar20  42:56 /usr/bin/envoy --restart-epoch 2
--config-path /etc/envoy/envoy.yaml --concurrency 4 --v2-config-only

According to lsof envoy has a TCP connection in an established state with an upstream server

kubernetes-monitor--8001 $ sudo lsof -np 22139 | grep 40522
envoy   22139 nobody   39u     IPv4           46618582      0t0      TCP
X.X.13.5:40522->X.X.11.9:https (ESTABLISHED)

kernels also confirms that:

kubernetes-monitor--8001$ sudo ss -toie | grep -A1 40522
ESTAB      0      0      X.X.13.5:40522                X.X.11.9:https                 uid:99
ino:46618582 sk:11954 <->
         ts sack cubic wscale:11,11 rto:204 rtt:3.233/6.071 ato:40 mss:1448 cwnd:10
bytes_acked:4530352 bytes_received:689649713 segs_out:403373 segs_in:820948 send 35.8Mbps
lastsnd:41425218 lastrcv:41424988 lastack:41424988 pacing_rate 71.7Mbps retrans:0/13 rcv_rtt:661.963
rcv_space:442082

But on the remote system we don't have that connection:

kubernetes--8001 $ sudo ss -t | grep 40522

the connection from local envoy client was still alive but not functioning as it was not receiving any data from the upstream server. I need to mention that after the client establishes a connection to envoy and envoy opens a connection to an upstream server, that server will push data back to the client and not the other way around. There is a long period of idleness due to the fact that client makes a HTTP API call and then waits for the server to send some data after a change happens on etcd. So, it is a typical long-lived connection where data are pushed by the server when an event occurs on the server side.

In this particular case, the server crashed, kernel panic followed by a reboot. Both, envoy and server have TCP keepalive on the kernel side:

net.ipv4.tcp_keepalive_intvl = 1
net.ipv4.tcp_keepalive_probes = 2
net.ipv4.tcp_keepalive_time = 30

but envoy doesn't instruct kernel to use it on the connections it opens and a result we ended it up in a situation where client was not getting any data and envoy was holding a zombie connection. Thus, adding support for SO_KEEPALIVE is a must for envoy and any proxy system.

Moreover, if you want to handle better long-lived connection without increasing the time it takes to detect broken connection, you may want to read about TCP User Timeout, https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?id=dca43c75.

This setting will allow envoy to configure a timeout for sockets which contain data not receiving an acknowledgment for the configured delay. This is especially useful on long-lived connections experiencing long idle periods such as the one I mentioned above. In those cases client and server timeouts[1] must remain high to allow those long period of idleness, but at the same time it is important to detect that the connection has disappeared.

[1] I don't think we have client and server timeouts on envoy, but only tcp-proxy-v2-tcpproxy-idle-timeout.

envoyproxy / envoy

Add support for TCP Keepalives #3028