freedev / solrcloud-zookeeper-kubernetes

Run Solrcloud and Zookeeper in a Kubernetes environment
Apache License 2.0
57 stars 29 forks source link

solrcloud zookeeper setup issues on kubernetes cluster #8

Open gs-offcl opened 4 years ago

gs-offcl commented 4 years ago

Have encountered following issues while I am trying to setup solrcloud and zookeeper cluster on kubernetes cluster (multi node),

Following are the scenarios experimented....

Scenario 1 - As is with public docker images (solr, zookeeper) on cluster

Steps:

  1. Clone the repo
  2. Change the configs as per cluster ( e.g storage class, own namespace...etc)
  3. ./start-aws-zookeeper-ensemble
  4. ./start-aws-solr-cluster ( after some 15 seconds)

Issues:

zookeeper.log

java.net.UnknownHostException: zk-2.zkensemble at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:607) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:534) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:454)

**Solr logs: - Didn't throw up any errors and I could see solr is able to connect zookeper clsuter**

2020-04-28 09:49:17.022 INFO (main) [ ] o.a.s.c.SolrResourceLoader [null] Added 0 libs to classloader, from paths: [] 2020-04-28 09:49:17.255 INFO (main) [ ] o.a.s.h.c.HttpShardHandlerFactory Host whitelist initialized: WhitelistHostChecker [whitelistHosts=null, whitelistHostCheckingEnabled=true] 2020-04-28 09:49:17.553 WARN (main) [ ] o.e.j.u.s.S.config No Client EndPointIdentificationAlgorithm configured for SslContextFactory@ac20bb4[provider=null,keyStore=null,trustStore=null] 2020-04-28 09:49:17.742 WARN (main) [ ] o.e.j.u.s.S.config No Client EndPointIdentificationAlgorithm configured for SslContextFactory@63c12e52[provider=null,keyStore=null,trustStore=null] 2020-04-28 09:49:17.763 INFO (main) [ ] o.a.s.c.ZkContainer Zookeeper client=zk-0.zkensemble:2181,zk-1.zkensemble:2181,zk-2.zkensemble:2181 2020-04-28 09:49:17.825 INFO (zkConnectionManagerCallback-9-thread-1) [ ] o.a.s.c.c.ConnectionManager zkClient has connected 2020-04-28 09:49:20.075 INFO (main) [ ] o.a.s.c.OverseerElectionContext I am going to be the leader solr-0.solrcluster:8983_solr 2020-04-28 09:49:20.108 INFO (main) [ ] o.a.s.c.Overseer Overseer (id=145194837495316480-solr-0.solrcluster:8983_solr-n_0000000000) starting 2020-04-28 09:49:20.272 INFO (zkConnectionManagerCallback-16-thread-1) [ ] o.a.s.c.c.ConnectionManager zkClient has connected 2020-04-28 09:49:20.300 INFO (main) [ ] o.a.s.c.s.i.ZkClientClusterStateProvider Cluster at zk-0.zkensemble:2181,zk-1.zkensemble:2181,zk-2.zkensemble:2181 ready 2020-04-28 09:49:20.434 INFO (main) [ ] o.a.s.c.ZkController Register node as live in ZooKeeper:/live_nodes/solr-0.solrcluster:8983_solr 2020-04-28 09:49:20.440 INFO (OverseerStateUpdate-145194837495316480-solr-0.solrcluster:8983_solr-n_0000000000) [ ] o.a.s.c.Overseer Starting to work on the main queue : solr-0.solrcluster:

Scenario 2 - Have rebuilt docker images (solr, zookeeper) using RHEL as base OS and deployed on K8s cluster

  1. Clone the repo
  2. Change the configs as per cluster ( e.g storage class, own namespace...etc)
  3. ./start-aws-zookeeper-ensemble
  4. ./start-aws-solr-cluster ( after some 15 seconds)

Zookeeper logs :

2020-04-28 10:18:05,913 [myid:] - INFO [main:QuorumPeerConfig@136] - Reading configuration from: /conf/zoo.cfg 2020-04-28 10:18:05,944 [myid:] - WARN [main:QuorumPeer$QuorumServer@191] - Failed to resolve address: zk-20.0.0.0 java.net.UnknownHostException: zk-20.0.0.0: Name or service not known at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) at java.net.InetAddress.getAllByName0(InetAddress.java:1277) at java.net.InetAddress.getAllByName(InetAddress.java:1193) at java.net.InetAddress.getAllByName(InetAddress.java:1127) at java.net.InetAddress.getByName(InetAddress.java:1077) at org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:181) at org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.(QuorumPeer.java:153) at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parseProperties(QuorumPeerConfig.java:240) java.net.UnknownHostException: zk-10.0.0.0: Name or service not known at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) at java.net.InetAddress.getAllByName0(InetAddress.java:1277) at java.net.InetAddress.getAllByName(InetAddress.java:1193) at java.net.InetAddress.getAllByName(InetAddress.java:1127) at java.net.InetAddress.getByName(InetAddress.java:1077)

Container logs:

/bin/sh: hostname: command not found

The above error seems to be coming while resolving hostname and replacing string during pod creation in statefulset..

       if [ ! -f $ZOO_DATA_DIR/myid ] ; then $(echo $((${HOSTNAME##*-}+1)) > $ZOO_DATA_DIR/myid ) else touch /conf/test; fi && \
       **$(echo $ZOO_SERVERS | sed \"s/$(hostname).zkensemble/0.0.0.0/g\" > /conf/zooservers.txt) && \**

Solr logs

Caused by: org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper zk-0.zkensemble:2181,zk-1.zkensemble:2181,zk-2.zkensemble:2181 within 30000 ms at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:201) at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:126) at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:121) at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:108) at org.apache.solr.servlet.SolrDispatchFilter.loadNodeConfig(SolrDispatchFilter.java:273) ... 50 more Caused by: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper zk-0.zkensemble:2181,zk-1.zkensemble:2181,zk-2.zkensemble:2181 within 30000 ms at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:250) at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:193) ... 54 more

Looking forward for help.....and do let me know if you need any other details.

technorodent commented 4 years ago

Did you solve this issue? Did it block implementation?

gs-offcl commented 4 years ago

yes...its resolved.

technorodent commented 4 years ago

How was it resolved?

gs-offcl commented 3 years ago

The issue was hostname not getting resolved in start up script. I have used POD_NAME with env var instead and started working..

if [ ! -f $ZOO_DATA_DIR/myid ] ; then $(echo $((${HOSTNAME##*-}+1)) > $ZOO_DATA_DIR/myid ) else touch /conf/test; fi && \        $(echo $ZOO_SERVERS | sed \"s/$MY_POD_NAME.zkensemble/0.0.0.0/g\" > /conf/zooservers.txt) && \

.... env: -name: MY_POD_NAME valueFrom: fieldRef: fieldPath: metadata.name

freedev commented 3 years ago

@gs-offcl thanks for the info. Just a question, is there a typo in: "s/$MY_POD_NAME.zkensemble/0.0.0.0/g" ? Given your comment I suppose the correct line should be "s/$POD_NAME.zkensemble/0.0.0.0/g" without MY_. Right?