contiv / netplugin

Container networking for various use cases
Apache License 2.0
515 stars 178 forks source link

cluster IP is not working on Kubernetes multi-nodes setup #771

Open jiahaoliang opened 7 years ago

jiahaoliang commented 7 years ago

Environment:

docker version: 1.12.5 contiv version: https://github.com/contiv/install/releases/tag/1.0.0-beta.3 OS: Centos7 (3.10.0-327.28.3.el7.x86_64) Kubernetes version: kubectl version Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1", GitCommit:"82450d03cb057bab0950214ef122b67c83fb11df", GitTreeState:"clean", BuildDate:"2016-12-14T00:57:05Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.3", GitCommit:"029c3a408176b55c30846f0faedf56aae5992e9b", GitTreeState:"clean", BuildDate:"2017-02-15T06:34:56Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

Contiv network setting: netctl global inspect Inspecting global { "Config": { "key": "global", "arpMode": "proxy", "fwdMode": "bridge", "name": "global", "networkInfraType": "default", "pvtSubnet": "172.19.0.0/16", "vlans": "1-4094", "vxlans": "1-10000" }, "Oper": { "clusterMode": "kubernetes", "numNetworks": 1, "vxlansInUse": "1" } }

netctl net ls

Tenant Network Nw Type Encap type Packet tag Subnet
default default-net data vxlan 0 20.1.1.0/24

Installation method: I follow https://github.com/contiv/install/blob/master/README.md#kubernetes-14-installation to install Kubernetes and Contiv

My topology is as below: topo

Issue Description

I tried with the the example @neelimamukiri gave https://github.com/microservices-demo/microservices-demo/blob/master/deploy/kubernetes/complete-demo.yaml?raw=true. But the situation is the same, the pods are unable to communicate with each other via their cluster ip (10.254.0.0/16) cidr.

I used tcpdump to debug the issue. Following is what I found. I tried to connect from "cart" to "cart-db" in the example. Docker "cart" is on node 192.168.50.48, contiv endpoint is 20.1.1.1 Docker "cart-db" is on node 192.168.50.46, contiv endpoint is 20.1.1.13, with cluster ip 10.254.210.119, the db listens on port 27017


Experiment 1: connect to "cart-db" endpoint ip Result: Accessible

  1. attach to "cart" docker: docker exec -it 7f127bb60c10 sh
  2. $ nc -vz 20.1.1.13 27017 result: 20.1.1.13 (20.1.1.13:27017) open
  3. tcpdump result on node 192.168.50.48: command: tcpdump -i any "dst net 10.254.0.0/16 or src net 10.254.0.0/16 or dst net 20.1.1.0/24 or src net 20.1.1.0/24" result: 21:11:00.228882 IP 20.1.1.1.44242 > 20.1.1.13.27017: Flags [S], seq 1226426215, win 28200, options [mss 1410,sackOK,TS val 104158552 ecr 0,nop,wscale 7], length 0 21:11:00.230217 IP 20.1.1.13.27017 > 20.1.1.1.44242: Flags [S.], seq 592358418, ack 1226426216, win 27960, options [mss 1410,sackOK,TS val 89390576 ecr 104158552,nop,wscale 7], length 0 21:11:00.230270 IP 20.1.1.1.44242 > 20.1.1.13.27017: Flags [.], ack 1, win 221, options [nop,nop,TS val 104158553 ecr 89390576], length 0 21:11:00.230694 IP 20.1.1.1.44242 > 20.1.1.13.27017: Flags [F.], seq 1, ack 1, win 221, options [nop,nop,TS val 104158553 ecr 89390576], length 0 21:11:00.233104 IP 20.1.1.13.27017 > 20.1.1.1.44242: Flags [.], ack 2, win 219, options [nop,nop,TS val 89390578 ecr 104158553], length 0 21:11:00.233132 IP 20.1.1.13.27017 > 20.1.1.1.44242: Flags [F.], seq 1, ack 2, win 219, options [nop,nop,TS val 89390578 ecr 104158553], length 0 21:11:00.233189 IP 20.1.1.1.44242 > 20.1.1.13.27017: Flags [.], ack 2, win 221, options [nop,nop,TS val 104158556 ecr 89390578], length 0

Experiment 2: connect to "cart-db" cluster ip Result: Inaccessible

  1. attach to "cart" docker: docker exec -it 7f127bb60c10 sh
  2. $ nc -vz 10.254.210.119 27017 result: nc: 10.254.210.119 (10.254.210.119:27017): Operation timed out
  3. tcpdump result on node 192.168.50.48: command: tcpdump -i any "dst net 10.254.0.0/16 or src net 10.254.0.0/16 or dst net 20.1.1.0/24 or src net 20.1.1.0/24" result: 21:11:28.350023 IP 20.1.1.1.37723 > 10.254.210.119.27017: Flags [S], seq 238910114, win 28200, options [mss 1410,sackOK,TS val 104186673 ecr 0,nop,wscale 7], length 0 21:11:28.351683 IP 20.1.1.13.27017 > 20.1.1.1.37723: Flags [S.], seq 2712418109, ack 238910115, win 27960, options [mss 1410,sackOK,TS val 89418698 ecr 104186673,nop,wscale 7], length 0 21:11:28.351775 IP 20.1.1.1.37723 > 20.1.1.13.27017: Flags [R], seq 238910115, win 0, length 0 21:11:29.352799 IP 20.1.1.1.37723 > 10.254.210.119.27017: Flags [S], seq 238910114, win 28200, options [mss 1410,sackOK,TS val 104187676 ecr 0,nop,wscale 7], length 0 21:11:29.353305 IP 20.1.1.13.27017 > 20.1.1.1.37723: Flags [S.], seq 2728072795, ack 238910115, win 27960, options [mss 1410,sackOK,TS val 89419699 ecr 104187676,nop,wscale 7], length 0 21:11:29.353352 IP 20.1.1.1.37723 > 20.1.1.13.27017: Flags [R], seq 238910115, win 0, length 0 21:11:31.356804 IP 20.1.1.1.37723 > 10.254.210.119.27017: Flags [S], seq 238910114, win 28200, options [mss 1410,sackOK,TS val 104189680 ecr 0,nop,wscale 7], length 0 21:11:31.357332 IP 20.1.1.13.27017 > 20.1.1.1.37723: Flags [S.], seq 2759385003, ack 238910115, win 27960, options [mss 1410,sackOK,TS val 89421703 ecr 104189680,nop,wscale 7], length 0 21:11:31.357391 IP 20.1.1.1.37723 > 20.1.1.13.27017: Flags [R], seq 238910115, win 0, length 0 21:11:33.356767 ARP, Request who-has 10.254.210.119 tell 20.1.1.1, length 28 21:11:33.357785 ARP, Reply 10.254.210.119 is-at 02:02:0a:fe:d2:77 (oui Unknown), length 28 21:11:35.364796 IP 20.1.1.1.37723 > 10.254.210.119.27017: Flags [S], seq 238910114, win 28200, options [mss 1410,sackOK,TS val 104193688 ecr 0,nop,wscale 7], length 0 21:11:35.365554 IP 20.1.1.13.27017 > 20.1.1.1.37723: Flags [S.], seq 2822012286, ack 238910115, win 27960, options [mss 1410,sackOK,TS val 89425712 ecr 104193688,nop,wscale 7], length 0 21:11:35.365619 IP 20.1.1.1.37723 > 20.1.1.13.27017: Flags [R], seq 238910115, win 0, length 0 21:11:43.372798 IP 20.1.1.1.37723 > 10.254.210.119.27017: Flags [S], seq 238910114, win 28200, options [mss 1410,sackOK,TS val 104201696 ecr 0,nop,wscale 7], length 0 21:11:43.373322 IP 20.1.1.13.27017 > 20.1.1.1.37723: Flags [S.], seq 2947135548, ack 238910115, win 27960, options [mss 1410,sackOK,TS val 89433719 ecr 104201696,nop,wscale 7], length 0 21:11:43.373369 IP 20.1.1.1.37723 > 20.1.1.13.27017: Flags [R], seq 238910115, win 0, length 0 21:11:59.404782 IP 20.1.1.1.37723 > 10.254.210.119.27017: Flags [S], seq 238910114, win 28200, options [mss 1410,sackOK,TS val 104217728 ecr 0,nop,wscale 7], length 0 21:11:59.406292 IP 20.1.1.13.27017 > 20.1.1.1.37723: Flags [S.], seq 3197644667, ack 238910115, win 27960, options [mss 1410,sackOK,TS val 89449752 ecr 104217728,nop,wscale 7], length 0 21:11:59.406348 IP 20.1.1.1.37723 > 20.1.1.13.27017: Flags [R], seq 238910115, win 0, length 0

Experiment 3: connect to "cart-db" cluster ip from another docker within the same host (node3:192.168.50.46) Result: Accessible

  1. look for a pod within the same host. kubectl get po --all-namespaces -o wide | grep node3 sock-shop cart-db-2053818980-v7d5p 1/1 Running 0 21h 20.1.1.13 node3-vm5-46 sock-shop orders-3248148685-n0gp3 1/1 Running 0 21h 20.1.1.8 node3-vm5-46 [truncated]
  2. attach to "orders-3248148685-n0gp3" docker: kubectl exec orders-3248148685-n0gp3 -it -n sock-shop -- /bin/sh
  3. $ nc -vz 10.254.210.119 27017 result: 10.254.210.119 (10.254.210.119:27017) open
  4. tcpdump result on node3 192.168.50.46: command: tcpdump -i any "dst net 10.254.0.0/16 or src net 10.254.0.0/16 or dst net 20.1.1.0/24 or src net 20.1.1.0/24" result: 23:00:27.184847 IP 20.1.1.8.37793 > 10.254.210.119.27017: Flags [S], seq 4149674294, win 28200, options [mss 1410,sackOK,TS val 95957531 ecr 0,nop,wscale 7], length 0 23:00:27.185663 IP 20.1.1.8.37793 > 20.1.1.13.27017: Flags [S], seq 4149674294, win 28200, options [mss 1410,sackOK,TS val 95957531 ecr 0,nop,wscale 7], length 0 23:00:27.185783 IP 20.1.1.13.27017 > 20.1.1.8.37793: Flags [S.], seq 940395066, ack 4149674295, win 27960, options [mss 1410,sackOK,TS val 95957532 ecr 95957531,nop,wscale 7], length 0 23:00:27.185885 IP 10.254.210.119.27017 > 20.1.1.8.37793: Flags [S.], seq 940395066, ack 4149674295, win 27960, options [mss 1410,sackOK,TS val 95957532 ecr 95957531,nop,wscale 7], length 0 23:00:27.185917 IP 20.1.1.8.37793 > 10.254.210.119.27017: Flags [.], ack 1, win 221, options [nop,nop,TS val 95957532 ecr 95957532], length 0 23:00:27.185922 IP 20.1.1.8.37793 > 20.1.1.13.27017: Flags [.], ack 1, win 221, options [nop,nop,TS val 95957532 ecr 95957532], length 0 23:00:27.186440 IP 20.1.1.8.37793 > 10.254.210.119.27017: Flags [F.], seq 1, ack 1, win 221, options [nop,nop,TS val 95957533 ecr 95957532], length 0 23:00:27.186445 IP 20.1.1.8.37793 > 20.1.1.13.27017: Flags [F.], seq 1, ack 1, win 221, options [nop,nop,TS val 95957533 ecr 95957532], length 0 23:00:27.187039 IP 20.1.1.13.27017 > 20.1.1.8.37793: Flags [.], ack 2, win 219, options [nop,nop,TS val 95957534 ecr 95957533], length 0 23:00:27.187044 IP 10.254.210.119.27017 > 20.1.1.8.37793: Flags [.], ack 2, win 219, options [nop,nop,TS val 95957534 ecr 95957533], length 0 23:00:27.189346 IP 20.1.1.13.27017 > 20.1.1.8.37793: Flags [F.], seq 1, ack 2, win 219, options [nop,nop,TS val 95957536 ecr 95957533], length 0 23:00:27.189353 IP 10.254.210.119.27017 > 20.1.1.8.37793: Flags [F.], seq 1, ack 2, win 219, options [nop,nop,TS val 95957536 ecr 95957533], length 0 23:00:27.189385 IP 20.1.1.8.37793 > 10.254.210.119.27017: Flags [.], ack 2, win 221, options [nop,nop,TS val 95957536 ecr 95957536], length 0 23:00:27.189389 IP 20.1.1.8.37793 > 20.1.1.13.27017: Flags [.], ack 2, win 221, options [nop,nop,TS val 95957536 ecr 95957536], length 0

My suspicion:

From experiment 2, we can see the DNAT is actually working for translate 10.254.210.119 to 20.1.1.13. But when the reply comes, the SNAT rule is missing translate 20.1.1.13 back to 10.254.210.119. From the view of "cart", it doesn't get reply from 10.254.210.119 though it actually gets it from 20.1.1.13 but ignored.

jojimt commented 7 years ago

Please provide the output from kubectl describe service

jiahaoliang commented 7 years ago

@jojimt there you go: kubectl get svc -o wide --all-namespaces NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR default example-service 10.254.90.136 8080:31557/TCP 1h run=load-balancer-example default kubernetes 10.254.0.1 443/TCP 1d kube-system kube-dns 10.254.0.10 53/UDP,53/TCP 1d name=kube-dns sock-shop cart 10.254.107.3 80/TCP 1d name=cart sock-shop cart-db 10.254.210.119 27017/TCP 1d name=cart-db sock-shop catalogue 10.254.214.105 80/TCP 1d name=catalogue sock-shop catalogue-db 10.254.69.190 3306/TCP 1d name=catalogue-db sock-shop front-end 10.254.114.228 80:30001/TCP 1d name=front-end sock-shop orders 10.254.2.237 80/TCP 1d name=orders sock-shop orders-db 10.254.175.200 27017/TCP 1d name=orders-db sock-shop payment 10.254.121.69 80/TCP 1d name=payment sock-shop queue-master 10.254.69.43 80/TCP 1d name=queue-master sock-shop rabbitmq 10.254.239.201 5672/TCP 1d name=rabbitmq sock-shop shipping 10.254.71.148 80/TCP 1d name=shipping sock-shop user 10.254.96.55 80/TCP 1d name=user sock-shop user-db 10.254.233.159 27017/TCP 1d name=user-db zipkin zipkin 10.254.21.187 9411:30002/TCP 1d name=zipkin zipkin zipkin-mysql 10.254.204.186 3306/TCP 1d name=zipkin-mysql

kubectl describe svc cart-db -n sock-shop Name: cart-db Namespace: sock-shop Labels: name=cart-db Selector: name=cart-db Type: ClusterIP IP: 10.254.210.119 Port: 27017/TCP Endpoints: 20.1.1.13:27017 Session Affinity: None No events.

jojimt commented 7 years ago

I am unable to repro this in a similar setup. Can you capture the tcp dump on 192.168.50.46 in the problem scenario?

jiahaoliang commented 7 years ago

@jojimt Tcpdump result on 192.168.50.46 in Experiment 2: tcpdump -i any "dst net 10.254.0.0/16 or src net 10.254.0.0/16 or dst net 20.1.1.0/24 or src net 20.1.1.0/24" tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes 01:53:05.941569 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [S], seq 614333009, win 28200, options [mss 1410,sackOK,TS val 121084261 ecr 0,nop,wscale 7], length 0 01:53:05.941813 IP 20.1.1.13.27017 > 20.1.1.1.39670: Flags [S.], seq 3308329133, ack 614333010, win 27960, options [mss 1410,sackOK,TS val 106316288 ecr 121084261,nop,wscale 7], length 0 01:53:05.942815 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [R], seq 614333010, win 0, length 0 01:53:06.942856 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [S], seq 614333009, win 28200, options [mss 1410,sackOK,TS val 121085264 ecr 0,nop,wscale 7], length 0 01:53:06.942970 IP 20.1.1.13.27017 > 20.1.1.1.39670: Flags [S.], seq 3323972412, ack 614333010, win 27960, options [mss 1410,sackOK,TS val 106317289 ecr 121085264,nop,wscale 7], length 0 01:53:06.943286 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [R], seq 614333010, win 0, length 0 01:53:08.946809 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [S], seq 614333009, win 28200, options [mss 1410,sackOK,TS val 121087268 ecr 0,nop,wscale 7], length 0 01:53:08.946931 IP 20.1.1.13.27017 > 20.1.1.1.39670: Flags [S.], seq 3355284318, ack 614333010, win 27960, options [mss 1410,sackOK,TS val 106319293 ecr 121087268,nop,wscale 7], length 0 01:53:08.947289 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [R], seq 614333010, win 0, length 0 01:53:12.960120 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [S], seq 614333009, win 28200, options [mss 1410,sackOK,TS val 121091280 ecr 0,nop,wscale 7], length 0 01:53:12.960202 IP 20.1.1.13.27017 > 20.1.1.1.39670: Flags [S.], seq 3417991758, ack 614333010, win 27960, options [mss 1410,sackOK,TS val 106323307 ecr 121091280,nop,wscale 7], length 0 01:53:12.960555 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [R], seq 614333010, win 0, length 0 01:53:20.974833 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [S], seq 614333009, win 28200, options [mss 1410,sackOK,TS val 121099296 ecr 0,nop,wscale 7], length 0 01:53:20.974943 IP 20.1.1.13.27017 > 20.1.1.1.39670: Flags [S.], seq 3543221988, ack 614333010, win 27960, options [mss 1410,sackOK,TS val 106331321 ecr 121099296,nop,wscale 7], length 0 01:53:20.975297 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [R], seq 614333010, win 0, length 0 01:53:37.007469 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [S], seq 614333009, win 28200, options [mss 1410,sackOK,TS val 121115328 ecr 0,nop,wscale 7], length 0 01:53:37.007633 IP 20.1.1.13.27017 > 20.1.1.1.39670: Flags [S.], seq 3793732471, ack 614333010, win 27960, options [mss 1410,sackOK,TS val 106347354 ecr 121115328,nop,wscale 7], length 0 01:53:37.008301 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [R], seq 614333010, win 0, length 0 01:53:42.021060 ARP, Request who-has 20.1.1.1 tell 20.1.1.13, length 28 01:53:42.022601 ARP, Reply 20.1.1.1 is-at 02:02:14:01:01:01 (oui Unknown), length 28 01:54:09.071360 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [S], seq 614333009, win 28200, options [mss 1410,sackOK,TS val 121147392 ecr 0,nop,wscale 7], length 0 01:54:09.071498 IP 20.1.1.13.27017 > 20.1.1.1.39670: Flags [S.], seq 4294730468, ack 614333010, win 27960, options [mss 1410,sackOK,TS val 106379418 ecr 121147392,nop,wscale 7], length 0 01:54:09.072263 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [R], seq 614333010, win 0, length 0

jojimt commented 7 years ago

Thanks @jiahaoliang. This is consistent with your analysis. Let me look into this and get back to you.

jojimt commented 7 years ago

Based on the debug logs, this appears to be an issue with re-install not cleaning up previous bridge. https://github.com/contiv/install/issues/61 should address this.

lidajia commented 7 years ago

I have the same problem, can't connect to a service ip from the service's pod. I have a nginx service 10.10.10.180 and nginx pods 20.1.1.3, in the nginx pod the connection to 10.10.10.180 was return timeout, but the connection to other service was ok.