Closed yu2003w closed 6 years ago
Hi, can you provide some more details and logs:
Have you done a "kubectl get all" to check if the zookeeper nodes are up and ready? Which pod is the above error from? Did you start seldon with seldon-up ? What is the size of your kubernetes cluster?
Yes, three zookeeper containers are running well. If three zookeeper is scheduled to different nodes, in each container, I could find the logs. The containers scheduled to the same machine could found each other.
I have 3 masters and 4 work nodes. zookeeper1-467704625-wvl6x 1/1 Running 0 8m 10.130.2.32 host-10-1-241-56 zookeeper2-1006738229-tm7sr 1/1 Running 0 8m 10.129.2.40 host-10-1-130-29 zookeeper3-1545771833-n9pmt 1/1 Running 0 8m 10.130.2.31 host-10-1-241-56
If you are running multi-node then you will need some form of persistent storage : see http://docs.seldon.io/install.html#storage However, the error you show seems to be more of a DNS or network error. Can you exec into the pod that is failing and see if you can connect to the zookeeper-3 host? Have you also checked this error is fatal and has not been recovered from? Also, which pod is failing? Seldon-server?
I failed to "curl -kv" services in some node. It seemed it's environment problem in my cluster.
This is my environment issue. It seemed that ovs of PaaS is conflict with that of IaaS. Thanks for the help.
Hi, When tried to setup seldon on k8s cluster, it seemed that zookeeper cluster was not running as expected. I got some error as below,
2017-10-20 17:47:32,812 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer$QuorumServer@149] - Resolved hostname: zookeeper-2 to address: zookeeper-2/172.30.123.16 2017-10-20 17:47:35,819 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@400] - Cannot open channel to 3 at election address zookeeper-3/172.30.134.85:3888 java.net.NoRouteToHostException: No route to host at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:381) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:426) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:843) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:822) 2017-10-20 17:47:35,822 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer$QuorumServer@149] - Resolved hostname: zookeeper-3 to address: zookeeper-3/172.30.134.85 2017-10-20 17:47:35,823 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
It seemed that the address setting is not correct. How should I fix such issues?
Thx, Jared