Closed JonathanO closed 6 years ago
I would like to mention that the absence of TCP KEEPALIVE has caused problems in production. Let me describe a real scenario that we had.
envoy runs on a host and has PID 22139
kubernetes-monitor-8001 $ ps aux | grep usr/bin/envoy
nobody 5372 0.0 0.0 125320 6316 ? Ss Mar12 0:01 python
/usr/bin/envoy-hot-restarter.py /usr/bin/envoy-start.sh
nobody 22139 0.1 0.0 97932 20148 ? Sl Mar20 42:56 /usr/bin/envoy --restart-epoch 2
--config-path /etc/envoy/envoy.yaml --concurrency 4 --v2-config-only
According to lsof
envoy has a TCP connection in an established state with an upstream server
kubernetes-monitor--8001 $ sudo lsof -np 22139 | grep 40522
envoy 22139 nobody 39u IPv4 46618582 0t0 TCP
X.X.13.5:40522->X.X.11.9:https (ESTABLISHED)
kernels also confirms that:
kubernetes-monitor--8001$ sudo ss -toie | grep -A1 40522
ESTAB 0 0 X.X.13.5:40522 X.X.11.9:https uid:99
ino:46618582 sk:11954 <->
ts sack cubic wscale:11,11 rto:204 rtt:3.233/6.071 ato:40 mss:1448 cwnd:10
bytes_acked:4530352 bytes_received:689649713 segs_out:403373 segs_in:820948 send 35.8Mbps
lastsnd:41425218 lastrcv:41424988 lastack:41424988 pacing_rate 71.7Mbps retrans:0/13 rcv_rtt:661.963
rcv_space:442082
But on the remote system we don't have that connection:
kubernetes--8001 $ sudo ss -t | grep 40522
the connection from local envoy client was still alive but not functioning as it was not receiving any data from the upstream server. I need to mention that after the client establishes a connection to envoy and envoy opens a connection to an upstream server, that server will push data back to the client and not the other way around. There is a long period of idleness due to the fact that client makes a HTTP API call and then waits for the server to send some data after a change happens on etcd
. So, it is a typical long-lived connection where data are pushed by the server when an event occurs on the server side.
In this particular case, the server crashed, kernel panic followed by a reboot. Both, envoy and server have TCP keepalive on the kernel side:
net.ipv4.tcp_keepalive_intvl = 1
net.ipv4.tcp_keepalive_probes = 2
net.ipv4.tcp_keepalive_time = 30
but envoy doesn't instruct kernel to use it on the connections it opens and a result we ended it up in a situation where client was not getting any data and envoy was holding a zombie connection. Thus, adding support for SO_KEEPALIVE is a must for envoy and any proxy system.
Moreover, if you want to handle better long-lived connection without increasing the time it takes to detect broken connection, you may want to read about TCP User Timeout, https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?id=dca43c75.
This setting will allow envoy to configure a timeout for sockets which contain data not receiving an acknowledgment for the configured delay. This is especially useful on long-lived connections experiencing long idle periods such as the one I mentioned above. In those cases client and server timeouts[1] must remain high to allow those long period of idleness, but at the same time it is important to detect that the connection has disappeared.
[1] I don't think we have client and server timeouts on envoy, but only tcp-proxy-v2-tcpproxy-idle-timeout.
Description: TCP keepalives should be configurable per cluster.
This allows detection of middleboxes silently dropping long lived but mostly idle connections, which could otherwise go unnoticed by one side. This is especially important for the Envoy -> *DS connection, where a silently dropped connection will probably go unnoticed the Envoy instance (and it'll receive no further discovery updates until it's restarted!)