Open jiahaoliang opened 7 years ago
Please provide the output from kubectl describe service
@jojimt there you go:
kubectl get svc -o wide --all-namespaces
NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default example-service 10.254.90.136
kubectl describe svc cart-db -n sock-shop
Name: cart-db
Namespace: sock-shop
Labels: name=cart-db
Selector: name=cart-db
Type: ClusterIP
IP: 10.254.210.119
Port:
I am unable to repro this in a similar setup. Can you capture the tcp dump on 192.168.50.46 in the problem scenario?
@jojimt Tcpdump result on 192.168.50.46 in Experiment 2: tcpdump -i any "dst net 10.254.0.0/16 or src net 10.254.0.0/16 or dst net 20.1.1.0/24 or src net 20.1.1.0/24" tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes 01:53:05.941569 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [S], seq 614333009, win 28200, options [mss 1410,sackOK,TS val 121084261 ecr 0,nop,wscale 7], length 0 01:53:05.941813 IP 20.1.1.13.27017 > 20.1.1.1.39670: Flags [S.], seq 3308329133, ack 614333010, win 27960, options [mss 1410,sackOK,TS val 106316288 ecr 121084261,nop,wscale 7], length 0 01:53:05.942815 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [R], seq 614333010, win 0, length 0 01:53:06.942856 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [S], seq 614333009, win 28200, options [mss 1410,sackOK,TS val 121085264 ecr 0,nop,wscale 7], length 0 01:53:06.942970 IP 20.1.1.13.27017 > 20.1.1.1.39670: Flags [S.], seq 3323972412, ack 614333010, win 27960, options [mss 1410,sackOK,TS val 106317289 ecr 121085264,nop,wscale 7], length 0 01:53:06.943286 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [R], seq 614333010, win 0, length 0 01:53:08.946809 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [S], seq 614333009, win 28200, options [mss 1410,sackOK,TS val 121087268 ecr 0,nop,wscale 7], length 0 01:53:08.946931 IP 20.1.1.13.27017 > 20.1.1.1.39670: Flags [S.], seq 3355284318, ack 614333010, win 27960, options [mss 1410,sackOK,TS val 106319293 ecr 121087268,nop,wscale 7], length 0 01:53:08.947289 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [R], seq 614333010, win 0, length 0 01:53:12.960120 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [S], seq 614333009, win 28200, options [mss 1410,sackOK,TS val 121091280 ecr 0,nop,wscale 7], length 0 01:53:12.960202 IP 20.1.1.13.27017 > 20.1.1.1.39670: Flags [S.], seq 3417991758, ack 614333010, win 27960, options [mss 1410,sackOK,TS val 106323307 ecr 121091280,nop,wscale 7], length 0 01:53:12.960555 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [R], seq 614333010, win 0, length 0 01:53:20.974833 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [S], seq 614333009, win 28200, options [mss 1410,sackOK,TS val 121099296 ecr 0,nop,wscale 7], length 0 01:53:20.974943 IP 20.1.1.13.27017 > 20.1.1.1.39670: Flags [S.], seq 3543221988, ack 614333010, win 27960, options [mss 1410,sackOK,TS val 106331321 ecr 121099296,nop,wscale 7], length 0 01:53:20.975297 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [R], seq 614333010, win 0, length 0 01:53:37.007469 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [S], seq 614333009, win 28200, options [mss 1410,sackOK,TS val 121115328 ecr 0,nop,wscale 7], length 0 01:53:37.007633 IP 20.1.1.13.27017 > 20.1.1.1.39670: Flags [S.], seq 3793732471, ack 614333010, win 27960, options [mss 1410,sackOK,TS val 106347354 ecr 121115328,nop,wscale 7], length 0 01:53:37.008301 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [R], seq 614333010, win 0, length 0 01:53:42.021060 ARP, Request who-has 20.1.1.1 tell 20.1.1.13, length 28 01:53:42.022601 ARP, Reply 20.1.1.1 is-at 02:02:14:01:01:01 (oui Unknown), length 28 01:54:09.071360 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [S], seq 614333009, win 28200, options [mss 1410,sackOK,TS val 121147392 ecr 0,nop,wscale 7], length 0 01:54:09.071498 IP 20.1.1.13.27017 > 20.1.1.1.39670: Flags [S.], seq 4294730468, ack 614333010, win 27960, options [mss 1410,sackOK,TS val 106379418 ecr 121147392,nop,wscale 7], length 0 01:54:09.072263 IP 20.1.1.1.39670 > 20.1.1.13.27017: Flags [R], seq 614333010, win 0, length 0
Thanks @jiahaoliang. This is consistent with your analysis. Let me look into this and get back to you.
Based on the debug logs, this appears to be an issue with re-install not cleaning up previous bridge. https://github.com/contiv/install/issues/61 should address this.
I have the same problem, can't connect to a service ip from the service's pod. I have a nginx service 10.10.10.180 and nginx pods 20.1.1.3, in the nginx pod the connection to 10.10.10.180 was return timeout, but the connection to other service was ok.
Environment:
docker version: 1.12.5 contiv version: https://github.com/contiv/install/releases/tag/1.0.0-beta.3 OS: Centos7 (3.10.0-327.28.3.el7.x86_64) Kubernetes version: kubectl version
Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1", GitCommit:"82450d03cb057bab0950214ef122b67c83fb11df", GitTreeState:"clean", BuildDate:"2016-12-14T00:57:05Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.3", GitCommit:"029c3a408176b55c30846f0faedf56aae5992e9b", GitTreeState:"clean", BuildDate:"2017-02-15T06:34:56Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Contiv network setting: netctl global inspect
Inspecting global { "Config": { "key": "global", "arpMode": "proxy", "fwdMode": "bridge", "name": "global", "networkInfraType": "default", "pvtSubnet": "172.19.0.0/16", "vlans": "1-4094", "vxlans": "1-10000" }, "Oper": { "clusterMode": "kubernetes", "numNetworks": 1, "vxlansInUse": "1" } }
netctl net ls
Installation method: I follow https://github.com/contiv/install/blob/master/README.md#kubernetes-14-installation to install Kubernetes and Contiv
My topology is as below:
Issue Description
I tried with the the example @neelimamukiri gave https://github.com/microservices-demo/microservices-demo/blob/master/deploy/kubernetes/complete-demo.yaml?raw=true. But the situation is the same, the pods are unable to communicate with each other via their cluster ip (10.254.0.0/16) cidr.
I used tcpdump to debug the issue. Following is what I found. I tried to connect from "cart" to "cart-db" in the example. Docker "cart" is on node 192.168.50.48, contiv endpoint is 20.1.1.1 Docker "cart-db" is on node 192.168.50.46, contiv endpoint is 20.1.1.13, with cluster ip 10.254.210.119, the db listens on port 27017
Experiment 1: connect to "cart-db" endpoint ip Result: Accessible
Experiment 2: connect to "cart-db" cluster ip Result: Inaccessible
Experiment 3: connect to "cart-db" cluster ip from another docker within the same host (node3:192.168.50.46) Result: Accessible
kubectl get po --all-namespaces -o wide | grep node3
sock-shop cart-db-2053818980-v7d5p 1/1 Running 0 21h 20.1.1.13 node3-vm5-46 sock-shop orders-3248148685-n0gp3 1/1 Running 0 21h 20.1.1.8 node3-vm5-46 [truncated]My suspicion:
From experiment 2, we can see the DNAT is actually working for translate 10.254.210.119 to 20.1.1.13. But when the reply comes, the SNAT rule is missing translate 20.1.1.13 back to 10.254.210.119. From the view of "cart", it doesn't get reply from 10.254.210.119 though it actually gets it from 20.1.1.13 but ignored.