SeldonIO / seldon-server

Machine Learning Platform and Recommendation Engine built on Kubernetes
https://www.seldon.io/
Apache License 2.0
1.47k stars 300 forks source link

zookeeper cluster cannot communicate with each other #28

Closed sbookworm closed 7 years ago

sbookworm commented 7 years ago

Hi all, when I installing the seldon, I faced a problem. After I installed kubernets, I run seldon-up.sh, the progress keep checking zookeeper status. I investigate the problem, it was caused by that in the zookeeper pods, cannot get communication with each other with the hostname. I try to using the service ip of each zookeeper, still failed. Because that the service ip cannot be accessed by the pod. can you give some advise? thanks very much !!!

gsunner commented 7 years ago

Hi,

Can you let us know how you are running kubernetes. Are you using minikube or another method?

sbookworm commented 7 years ago

Hi Gurminder, I think my kubernet is not the minikube I am running kubernets like these(one mater, and one minion, only using minion node to run docker containers): master node: (flannel network 172.17.0.0/16) etcd, kube-apiserver, kube-controller-manager, kube-scheduler, and so on minion node: kube-proxy, kubelet, flannel(flannel network 172.17.28.0/24 (subnet), docker0: 172.17.28.1) when running seldon-up.sh the services and pods created successfully, there are three zookeeper pods, zookeeper-1, zookeeper-2, zookeeper-3 take zookeeper-1 for example, its cluster ip is 10.254.168.191 and endpoint address is 172.17.28.3 run cmd on master : kubectl exec zookeeper-1 -- bash -c "echo srvr | nc 172.17.28.3 2181" I can get the message "This ZooKeeper instance is not currently serving requests" on the minion node, I run the command, echo srvr | nc 10.254.168.191 2181 also get the reply "This ZooKeeper instance is not currently serving requests" this means that I can access the service via kube-proxy. the problem is: when I run command "kubectl exec zookeeper-2 -- bash -c "echo srvr | nc 10.254.168.191 2181" there is no reply, this means that in zookeeper-2 cannot access the address 10.254.168.191 when I run command "kubectl exec zookeeper-2 -- bash -c "echo srvr | nc zookeeper-1 2181" reply "cannot found hostname" SO, my question is how can the zookeeper node communicate with each other? Does there some iptables rule needed? Another question, which seldon pods communicate with each other using kube-dns or using environment variable?

can you give me some guidethat you installed your kubernets? thanks very much

gsunner commented 7 years ago

Hi,

This looks like a dns issue. The zookeeper servers are unable to find each other.

Latest version of kubernetes should have the dns service built in. see http://kubernetes.io/docs/admin/dns/

Which version of kubernetes are you using?

sbookworm commented 7 years ago

Hi Gurminder, thanks for you reply, I am using kubernets v1.2.0 Client Version: version.Info{Major:"1", Minor:"2", GitVersion:"v1.2.0", GitCommit:"ec7364b6e3b155e78086018aa644057edbe196e5", GitTreeState:"clean"} Server Version: version.Info{Major:"1", Minor:"2", GitVersion:"v1.2.0", GitCommit:"ec7364b6e3b155e78086018aa644057edbe196e5", GitTreeState:"clean"}

Indeed, there are two problems, cannot know hostname might be dns problem, but when I using cluster ip address, still not work, there might be network routing problems.

By the way, the kube-dns is required by seldon, right? Thanks very much again~

gsunner commented 7 years ago

Hi,

kubernets v1.2.0 would not have the dns built in, you would to manually add that.

It would be best to use the latest version of kubernetes or at least v1.3.

sbookworm commented 7 years ago

@gsunner Hi Gurminder, Thank you very much for your help, I have fix the problem that zookeepers cannot communicate with each other by services ip address. Just add some rules in iptables can resolve it. It's the configuration problem of k8s. The only problem is kube-dns now, thanks