Azure / aks-engine

AKS Engine: legacy tool for Kubernetes on Azure (see status)
https://github.com/Azure/aks-engine
MIT License
1.03k stars 522 forks source link

Connection lifetime is changed once connection has established --conntrack #1912

Closed digeler closed 4 years ago

digeler commented 5 years ago

i have some doubts regarding the connection lifetime values changes in conntrack for aks. when a tcp is established we see the connection life time is : conntrack -L 6 86389 ESTABLISHED src=10.255.255.5 dst=10.240.0.37 sport=57598 dport=443 src=10.240.0.37 dst=10.255.255.5 sport=443 dport=57598 [ASSURED] mark=0 use=1

then for other connections with established we see it drops to 3 min :

tcp 6 181 ESTABLISHED src=10.240.0.42 dst=10.0.177.37 sport=36349 dport=27017 src=10.240.0.67 dst=10.240.0.42 sport=27017 dport=36349 [ASSURED] mark=0 use=1

when the tcp connection first establishes it should not change the connection lifetime.

Steps To Reproduce Api model :

{ "apiVersion": "vlabs", "properties": { "orchestratorProfile": { "orchestratorType": "Kubernetes", "orchestratorRelease": "1.14" }, "masterProfile": { "count": 1, "dnsPrefix": "markwtest22", "vmSize": "Standard_D2_v3" }, "agentPoolProfiles": [ { "name": "agentpool1", "count": 2, "vmSize": "Standard_D2_v3" } ], "linuxProfile": { "adminUsername": "azureuser", "ssh": { "publicKeys": [ { "keyData": "" } ] } }, "servicePrincipalProfile": { "clientId": "", "secret": " } } }

Repro : we got a reproducible test case for it now which needs a blank mongo image and a small asp.net core application:

https://github.com/blushingpenguin/aks-network-repro

so to reproduce:

git clone git@github.com:blushingpenguin/aks-network-repro.git

edit client.yaml and change container registry

cd TestClient

docker build -t myregistry.azurecr.io/testclient:v5 -f .\Dockerfile .

docker push myregistry.azurecr.io/testclient:v5

kubectl apply -f ../server.yaml

kubectl apply -f ../client.yaml

you want the server + client to run on different nodes -- this isn't ensured

by the test so needs poking manually

make a request http://mongo-client:5000/

find the ip of the client and mongo service, and on the client node:

conntrack -L |grep client-ip|grep server-ip

conntrack v1.4.3 (conntrack-tools): 1059 flow entries have been shown. tcp 6 297 ESTABLISHED src=10.3.0.59 dst=10.1.95.248 sport=37585 dport=27017 src=10.3.0.122 dst=10.3.0.59 sport=27017 dport=37585 [ASSURED] mark=0 use=1 tcp 6 86398 ESTABLISHED src=10.3.0.59 dst=10.1.95.248 sport=38089 dport=27017 src=10.3.0.122 dst=10.3.0.59 sport=27017 dport=38089 [ASSURED] mark=0 use=2

Expected behavior Lifetime should not be changed for establish connection AKS Engine version aks-engine-v0.35.1-darwin-amd64 Kubernetes version 1.14 Additional context

is this something by design ? why do we keep it to only 5 min

thank you.

blushingpenguin commented 5 years ago

I would add that the connection lifetime being used is the value of nf_conntrack_tcp_timeout_max_retrans (changing the value on the node hosting the pod changes the observed connection lifetime), so I think that something is causing netfilter to believe that TCP retransmissions are occurring. Collecting packets with tcpdump and analyzing with wireshark didn't show any retransmissions however. This traffic pattern is seemingly easily generated with the mongo c# driver, but attempts to reproduce the issue with telnet or the mongo shell failed (those just get the normal nf_conntrack_tcp_timeout_established value, which in AKS seems to be set to 86400).

(I didn't get any further than this -- I think I'd need to recompile the kernel to do so -- there are some pr_debug statements in the conntrack code which say roughly what is going on, but sadly they aren't of much assistance as the conntrack tuple isn't printed out (the code calls an inline function to dump the tuple, which is a printk statement compiled out in the release kernel)).

digeler commented 5 years ago

Hi Team ,any updates on this ?

jackfrancis commented 5 years ago

Is this a tcp stack configuration question or iptables configuration question? I'm not sure what the actual concrete problem is in the TCP/IP behaviors here.

@khenidak FYI

blushingpenguin commented 5 years ago

The problem is that the connection lifetime in the ip conntrack table is being erroneously set to 300s vs the correct value for an established tcp connection of 86400s. This page:

https://www.kernel.org/doc/Documentation/networking/nf_conntrack-sysctl.txt

lists the default timeout values for the various tcp connection states in the conntrack table.

When @digeler tried this reproduction on his aks engine deployment he only got mongo client connections with the incorrect lifetime. When I see this (on an AKS cluster provisioned through the azure portal) I get connections that sometimes have the correct lifetime (86400s) and sometimes have the incorrect lifetime (300s) and I also see established connections oscillating between the two timeouts when making traffic flow over the connection.

The lower timeout value applied is that of nf_conntrack_tcp_timeout_max_retrans (that can be seen by changing the value of the tunable in proc, which changes the connection lifetime) which is a timeout that is applied to a connection that conntrack believes is half open (i.e. one end isn't responding). The connection is not in fact half open though -- it works perfectly well and capturing packet traces with tcpdump and analyzing with wireshark don't show any issues. Thus this looks quite a bit like a bug in conntrack to me.

Let me know if I can clarify any further.

Thanks,

Mark

jackfrancis commented 5 years ago

That makes sense, thanks for the detailed overview. To rule out a bug in conntrack, do we have any higher layer (e.g., application layer) symptoms that we'd expect if indeed established connections are being interrupted prior to the expected 24 hour timeout (86400 seconds)?

On a related note, we have some targeted kernel configuration validation that we validate for every cluster buildout to ensure that we aren't surprised by a kernel version change or some other unexpected vector:

https://github.com/Azure/aks-engine/blob/master/test/e2e/kubernetes/scripts/net-config-validate.sh

If we want to protect more configuration values there, lemme know or submit a PR to add that (that script is run against all PRs.

blushingpenguin commented 5 years ago

Higher level -- yes, if you leave the connection to timeout then the mongo client will throw an error. You can easily get that to happen in the reproduction by requesting a page from the test client, checking for a low lifetime connection, waiting 5 minutes, then requesting a page again -- you'll then get an error. (That behaviour is what led to me digging down this far in the first place).

I don't think that this is a matter of changing kernel tunables -- none need to be changed from the aks defaults to get this problem to occur.

jackfrancis commented 5 years ago

Has anyone tried repro'ing w/ kubenet (simple bridge networking, no CNI)? Just add a "networkPlugin": "kubenet" to the "kubernetesConfig" config object under "orchestratorProfile". E.g.:

https://github.com/Azure/aks-engine/tree/master/examples/networkplugin#kubenet

If the repro happens in both Azure CNI and kubenet configurations, then we should be able to go lower down the stack for more investigation...

blushingpenguin commented 5 years ago

That's a reasonable suggestion but it's already further down the stack than that -- the CNI plugin in this case has just configured a bunch of iptables rules to manage traffic between the pods (the networking is all clusterIP). You can see this if you run the repro then look at the iptables configurations on the nodes.

jackfrancis commented 5 years ago

@cpuguy83 does this thread suggest some interesting mediation is happening at the container runtime networking layer? @blushingpenguin have we tried repro'ing on 18.04-LTS?

See the "distro" config vals here:

https://github.com/Azure/aks-engine/blob/master/examples/ubuntu-1804/kubernetes.json

blushingpenguin commented 5 years ago

I don't think so, @digeler is that something you could try out? Otherwise I can try and set up a cluster using aks-engine next week.

cpuguy83 commented 5 years ago

@jackfrancis For k8s, the container runtime does not touch the network, it is all handled by the kubelet/cni.

digeler commented 5 years ago

@jackfrancis this is the vm details : "aksEngineVersion": "v0.32.3", "creationSource": "aksengine-k8s-agentpool1-29816831-vmss", "orchestrator": "Kubernetes:1.14.0-beta.1", "poolName": "agentpool1", "resourceNameSuffix": "29816831" }, "instanceId": "0", "sku": { "name": "Standard_D2_v3", "tier": "Standard" }, "properties": { "latestModelApplied": true, "vmId": "f12e58a3-dac3-4856-8209-5c17db107627", "hardwareProfile": {}, "storageProfile": { "imageReference": { "publisher": "microsoft-aks", "offer": "aks", "sku": "aks-ubuntu-1604-201903", "version": "2019.03.05" },

so do you want to see if the issue occurs with this api model also : https://github.com/Azure/aks-engine/blob/master/examples/ubuntu-1804/kubernetes.json

please let me know.

jackfrancis commented 5 years ago

@digeler yes, and that can only be tested via aks-engine (there is no 18.04-LTS option in AKS yet)

digeler commented 5 years ago

ok i will do the repro and report back,

digeler commented 5 years ago

here is the repro ,seems like not happening on the 18.04 first the conntrack from the client node :

tcp 6 86396 ESTABLISHED src=10.240.0.5 dst=10.240.255.5 sport=35060 dport=443 src=10.240.255.5 dst=10.240.0.5 sport=443 dport=35060 [ASSURED] mark=0 use=1 tcp 6 142 ESTABLISHED src=127.0.0.1 dst=127.0.0.1 sport=34824 dport=41099 src=127.0.0.1 dst=127.0.0.1 sport=41099 dport=34824 [ASSURED] mark=0 use=1 tcp 6 86395 ESTABLISHED src=10.244.1.4 dst=10.0.0.1 sport=37204 dport=443 src=10.240.255.5 dst=10.240.0.5 sport=443 dport=37204 [ASSURED] mark=0 use=1 tcp 6 86392 ESTABLISHED src=10.244.1.8 dst=10.0.78.100 sport=45451 dport=27017 src=10.244.0.6 dst=10.244.1.8 sport=27017 dport=45451 [ASSURED] mark=0 use=1 tcp 6 86379 ESTABLISHED src=10.244.1.5 dst=10.0.0.1 sport=40876 dport=443 src=10.240.255.5 dst=10.240.0.5 sport=443 dport=40876 [ASSURED] mark=0 use=1 tcp 6 86399 ESTABLISHED src=10.240.0.5 dst=10.240.255.5 sport=35214 dport=443 src=10.240.255.5 dst=10.240.0.5 sport=443 dport=35214 [ASSURED] mark=0 use=1 tcp 6 86242 ESTABLISHED src=10.244.1.8 dst=10.0.78.100 sport=36991 dport=27017 src=10.244.0.6 dst=10.244.1.8 sport=27017 dport=36991 [ASSURED] mark=0 use=1 tcp 6 86393 ESTABLISHED src=10.244.1.4 dst=10.240.255.5 sport=58208 dport=10255 src=10.240.255.5 dst=10.240.0.5 sport=10255 dport=58208 [ASSURED] mark=0 use=1 tcp 6 86396 ESTABLISHED src=10.240.255.5 dst=10.240.0.5 sport=52952 dport=10250 src=10.240.0.5 dst=10.240.255.5 sport=10250 dport=52952 [ASSURED] mark=0 use=1 tcp 6 86393 ESTABLISHED src=10.240.0.4 dst=10.240.0.5 sport=55476 dport=10255 src=10.240.0.5 dst=10.240.0.4 sport=10255 dport=55476 [ASSURED] mark=0 use=1 tcp 6 86390 ESTABLISHED src=10.240.0.5 dst=169.254.169.254 sport=52110 dport=80 src=169.254.169.254 dst=10.240.0.5 sport=80 dport=52110 [ASSURED] mark=0 use=1 tcp 6 86393 ESTABLISHED src=10.244.1.4 dst=10.240.0.5 sport=33898 dport=10255 src=10.240.0.5 dst=10.244.1.4 sport=10255 dport=33898 [ASSURED] mark=0 use=2 tcp 6 86393 ESTABLISHED src=10.244.1.4 dst=10.240.0.4 sport=56636 dport=10255 src=10.240.0.4 dst=10.240.0.5 sport=10255 dport=56636 [ASSURED] mark=0 use=2 tcp 6 86373 ESTABLISHED src=10.244.1.5 dst=10.0.121.99 sport=37242 dport=80 src=10.244.1.4 dst=10.244.1.5 sport=8082 dport=37242 [ASSURED] mark=0 use=1 tcp 6 86397 ESTABLISHED src=10.244.1.4 dst=10.0.0.1 sport=37268 dport=443 src=10.240.255.5 dst=10.240.0.5 sport=443 dport=37268 [ASSURED] mark=0 use=1

my pods :

mongo-client 1/1 Running 0 73m 10.244.1.8 k8s-agentpool1-26096251-vmss000001 mongo-server 1/1 Running 0 75m 10.244.0.6 k8s-agentpool1-26096251-vmss000000

my nodes :

k8s-agentpool1-26096251-vmss000000 Ready,SchedulingDisabled agent 155m v1.12.8 10.240.0.4 Ubuntu 18.04.3 LTS 5.0.0-1016-azure docker://3.0.6 k8s-agentpool1-26096251-vmss000001 Ready agent 155m v1.12.8 10.240.0.5 Ubuntu 18.04.3 LTS 5.0.0-1016-azure docker://3.0.6 k8s-master-26096251-0 Ready master 155m v1.12.8 10.240.255.5 Ubuntu 18.04.3 LTS 5.0.0-1016-azure docker://3.0.6

does this means issue is fixed ? @Mark to confirm @Jack please add your comments.

thanks

jackfrancis commented 5 years ago

@digeler It could mean that the root cause of this behavior exists in the way that kubelet + CNI interact with the kernel network interfaces in our (by our I mean Azure + Canonical) 16.04-LTS implementation; but our 18.04-LTS implementation does not induce these symptoms.

Some more negative repros from folks on 18.04-LTS would get us closer to making that (admittedly high level) conclusion.

Additionally, a finding like this could add urgency to the effort to make 18.04-LTS the default Linux OS for aks-engine (and eventually AKS).

@CecileRobertMichon FYI

digeler commented 5 years ago

Checked also 16.04 with kubenet , the issue is there , so looks like the issue is with 16.04. here is the grep from conntrack : cp 6 86205 ESTABLISHED src=10.244.2.6 dst=10.0.223.224 sport=33041 dport=27017 src=10.244.1.7 dst=10.244.2.6 sport=27017 dport=33041 [ASSURED] mark=0 use=1 tcp 6 86399 ESTABLISHED src=10.244.2.6 dst=10.240.0.4 sport=34648 dport=22 src=10.240.0.4 dst=10.244.2.6 sport=22 dport=34648 [ASSURED] mark=0 use=1 tcp 6 299 ESTABLISHED src=10.244.2.6 dst=10.0.223.224 sport=34659 dport=27017 src=10.244.1.7 dst=10.244.2.6 sport=27017 dport=34659 [ASSURED] mark=0 use=1

onntrack v1.4.3 (conntrack-tools): 364 flow entries have been shown. tcp 6 86108 ESTABLISHED src=10.244.2.6 dst=10.0.223.224 sport=33041 dport=27017 src=10.244.1.7 dst=10.244.2.6 sport=27017 dport=33041 [ASSURED] mark=0 use=1 tcp 6 291 ESTABLISHED src=10.244.2.6 dst=10.0.223.224 sport=34659 dport=27017 src=10.244.1.7 dst=10.244.2.6 sport=27017 dport=34659 [ASSURED] mark=0 use=1

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE mongo-client 1/1 Running 0 7m7s 10.244.2.6 k8s-agentpool1-26096251-vmss000000 mongo-server 1/1 Running 0 5m58s 10.244.1.7 k8s-agentpool1-26096251-vmss000001

full conntrack for established :

root@k8s-agentpool1-26096251-vmss000000:~# conntrack -L | grep ESTABLISHED tcp 6 86395 ESTABLISHED src=10.244.2.3 dst=10.240.0.4 sport=60026 dport=10255 src=10.240.0.4 dst=10.244.2.3 sport=10255 dport=60026 [ASSURED] mark=0 use=1 tcp 6 86389 ESTABLISHED src=10.240.255.5 dst=10.244.2.2 sport=40888 dport=443 src=10.244.2.2 dst=10.240.255.5 sport=443 dport=40888 [ASSURED] mark=0 use=1 tcp 6 86397 ESTABLISHED src=10.244.2.3 dst=10.0.0.1 sport=36450 dport=443 src=10.240.255.5 dst=10.240.0.4 sport=443 dport=36450 [ASSURED] mark=0 use=1 tcp 6 86360 ESTABLISHED src=10.244.2.6 dst=10.0.223.224 sport=33041 dport=27017 src=10.244.1.7 dst=10.244.2.6 sport=27017 dport=33041 [ASSURED] mark=0 use=1 tcp 6 86395 ESTABLISHED src=10.244.2.2 dst=10.240.0.5 sport=59042 dport=10255 src=10.240.0.5 dst=10.240.0.4 sport=10255 dport=59042 [ASSURED] mark=0 use=1 tcp 6 86398 ESTABLISHED src=10.244.2.2 dst=10.0.0.1 sport=43894 dport=443 src=10.240.255.5 dst=10.240.0.4 sport=443 dport=43894 [ASSURED] mark=0 use=1 tcp 6 86398 ESTABLISHED src=10.244.2.3 dst=10.0.0.1 sport=36388 dport=443 src=10.240.255.5 dst=10.240.0.4 sport=443 dport=36388 [ASSURED] mark=0 use=1 tcp 6 300 ESTABLISHED src=10.240.255.5 dst=10.240.0.4 sport=51230 dport=10250 src=10.240.0.4 dst=10.240.255.5 sport=10250 dport=51230 [ASSURED] mark=0 use=1 tcp 6 86400 ESTABLISHED src=10.244.2.6 dst=10.240.0.4 sport=34648 dport=22 src=10.240.0.4 dst=10.244.2.6 sport=22 dport=34648 [ASSURED] mark=0 use=1 tcp 6 86395 ESTABLISHED src=10.244.2.3 dst=10.240.255.5 sport=49756 dport=10255 src=10.240.255.5 dst=10.240.0.4 sport=10255 dport=49756 [ASSURED] mark=0 use=1 tcp 6 293 ESTABLISHED src=10.244.2.6 dst=10.0.223.224 sport=34659 dport=27017 src=10.244.1.7 dst=10.244.2.6 sport=27017 dport=34659 [ASSURED] mark=0 use=1 tcp 6 86399 ESTABLISHED src=10.240.0.4 dst=10.240.255.5 sport=36190 dport=443 src=10.240.255.5 dst=10.240.0.4 sport=443 dport=36190 [ASSURED] mark=0 use=1 conntrack v1.4.3 (conntrack-tools): 362 flow entries have been shown. tcp 6 86395 ESTABLISHED src=10.244.2.2 dst=10.240.255.5 sport=48626 dport=10255 src=10.240.255.5 dst=10.240.0.4 sport=10255 dport=48626 [ASSURED] mark=0 use=1 tcp 6 86396 ESTABLISHED src=10.240.255.5 dst=10.244.2.2 sport=40890 dport=443 src=10.244.2.2 dst=10.240.255.5 sport=443 dport=40890 [ASSURED] mark=0 use=1 tcp 6 86400 ESTABLISHED src=127.0.0.1 dst=127.0.0.1 sport=35060 dport=46737 src=127.0.0.1 dst=127.0.0.1 sport=46737 dport=35060 [ASSURED] mark=0 use=1 tcp 6 86387 ESTABLISHED src=10.240.0.4 dst=169.254.169.254 sport=33990 dport=80 src=169.254.169.254 dst=10.240.0.4 sport=80 dport=33990 [ASSURED] mark=0 use=1 tcp 6 86396 ESTABLISHED src=10.244.1.4 dst=10.244.2.3 sport=51146 dport=8082 src=10.244.2.3 dst=10.244.1.4 sport=8082 dport=51146 [ASSURED] mark=0 use=1 tcp 6 86395 ESTABLISHED src=10.244.2.3 dst=10.240.0.5 sport=50642 dport=10255 src=10.240.0.5 dst=10.240.0.4 sport=10255 dport=50642 [ASSURED] mark=0 use=1 tcp 6 86395 ESTABLISHED src=10.244.2.2 dst=10.240.0.4 sport=39414 dport=10255 src=10.240.0.4 dst=10.244.2.2 sport=10255 dport=39414 [ASSURED] mark=0 use=1 tcp 6 86396 ESTABLISHED src=10.240.0.4 dst=10.240.255.5 sport=35964 dport=443 src=10.240.255.5 dst=10.240.0.4 sport=443 dport=35964 [ASSURED] mark=0 use=1

blushingpenguin commented 5 years ago

looks likely to have been fixed by: https://github.com/torvalds/linux/commit/959b69ef57db00cb33e9c4777400ae7183ebddd3

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.