envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
25.02k stars 4.82k forks source link

Add support for TCP Keepalives #3028

Closed JonathanO closed 6 years ago

JonathanO commented 6 years ago

Description: TCP keepalives should be configurable per cluster.

This allows detection of middleboxes silently dropping long lived but mostly idle connections, which could otherwise go unnoticed by one side. This is especially important for the Envoy -> *DS connection, where a silently dropped connection will probably go unnoticed the Envoy instance (and it'll receive no further discovery updates until it's restarted!)

unixsurfer commented 6 years ago

I would like to mention that the absence of TCP KEEPALIVE has caused problems in production. Let me describe a real scenario that we had.

envoy runs on a host and has PID 22139

kubernetes-monitor-8001 $ ps aux | grep usr/bin/envoy
nobody    5372  0.0  0.0 125320  6316 ?        Ss   Mar12   0:01 python
/usr/bin/envoy-hot-restarter.py /usr/bin/envoy-start.sh
nobody   22139  0.1  0.0  97932 20148 ?        Sl   Mar20  42:56 /usr/bin/envoy --restart-epoch 2
--config-path /etc/envoy/envoy.yaml --concurrency 4 --v2-config-only

According to lsof envoy has a TCP connection in an established state with an upstream server

kubernetes-monitor--8001 $ sudo lsof -np 22139 | grep 40522
envoy   22139 nobody   39u     IPv4           46618582      0t0      TCP
X.X.13.5:40522->X.X.11.9:https (ESTABLISHED)

kernels also confirms that:

kubernetes-monitor--8001$ sudo ss -toie | grep -A1 40522
ESTAB      0      0      X.X.13.5:40522                X.X.11.9:https                 uid:99
ino:46618582 sk:11954 <->
         ts sack cubic wscale:11,11 rto:204 rtt:3.233/6.071 ato:40 mss:1448 cwnd:10
bytes_acked:4530352 bytes_received:689649713 segs_out:403373 segs_in:820948 send 35.8Mbps
lastsnd:41425218 lastrcv:41424988 lastack:41424988 pacing_rate 71.7Mbps retrans:0/13 rcv_rtt:661.963
rcv_space:442082

But on the remote system we don't have that connection:

kubernetes--8001 $ sudo ss -t | grep 40522

the connection from local envoy client was still alive but not functioning as it was not receiving any data from the upstream server. I need to mention that after the client establishes a connection to envoy and envoy opens a connection to an upstream server, that server will push data back to the client and not the other way around. There is a long period of idleness due to the fact that client makes a HTTP API call and then waits for the server to send some data after a change happens on etcd. So, it is a typical long-lived connection where data are pushed by the server when an event occurs on the server side.

In this particular case, the server crashed, kernel panic followed by a reboot. Both, envoy and server have TCP keepalive on the kernel side:

net.ipv4.tcp_keepalive_intvl = 1
net.ipv4.tcp_keepalive_probes = 2
net.ipv4.tcp_keepalive_time = 30

but envoy doesn't instruct kernel to use it on the connections it opens and a result we ended it up in a situation where client was not getting any data and envoy was holding a zombie connection. Thus, adding support for SO_KEEPALIVE is a must for envoy and any proxy system.

Moreover, if you want to handle better long-lived connection without increasing the time it takes to detect broken connection, you may want to read about TCP User Timeout, https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?id=dca43c75.

This setting will allow envoy to configure a timeout for sockets which contain data not receiving an acknowledgment for the configured delay. This is especially useful on long-lived connections experiencing long idle periods such as the one I mentioned above. In those cases client and server timeouts[1] must remain high to allow those long period of idleness, but at the same time it is important to detect that the connection has disappeared.

[1] I don't think we have client and server timeouts on envoy, but only tcp-proxy-v2-tcpproxy-idle-timeout.