sometimes the Pulsar zookeeper metadata initialization job gets stuck and doesn't recover
add a 60 second default timeout
example error:
2022-06-13T11:33:42,814+0000 [main-SendThread(pulsar-luna-uswest1-staging-zookeeper-ca.pulsar.svc.cluster.local:2181)] WARN org.apache.zookeeper.ClientCnxn - An exception was thrown while closing send thread for session 0x0.
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777) ~[?:?]
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:344) ~[org.apache.zookeeper-zookeeper-3.8.0.jar:3.8.0]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1282) [org.apache.zookeeper-zookeeper-3.8.0.jar:3.8.0]
2022-06-13T11:33:42,921+0000 [main] INFO org.apache.zookeeper.ZooKeeper - Session: 0x0 closed
2022-06-13T11:33:42,921+0000 [main-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x0
Exception in thread "main" org.apache.pulsar.metadata.api.MetadataStoreException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.pulsar.metadata.impl.ZKMetadataStore.<init>(ZKMetadataStore.java:108)
at org.apache.pulsar.metadata.impl.MetadataStoreFactoryImpl.newInstance(MetadataStoreFactoryImpl.java:56)
at org.apache.pulsar.metadata.impl.MetadataStoreFactoryImpl.createExtended(MetadataStoreFactoryImpl.java:36)
at org.apache.pulsar.metadata.api.extended.MetadataStoreExtended.create(MetadataStoreExtended.java:40)
at org.apache.pulsar.PulsarClusterMetadataSetup.initMetadataStore(PulsarClusterMetadataSetup.java:380)
at org.apache.pulsar.PulsarClusterMetadataSetup.main(PulsarClusterMetadataSetup.java:238)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
at org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase.waitForConnection(ZooKeeperWatcherBase.java:159)
at org.apache.pulsar.metadata.impl.PulsarZooKeeperClient$Builder.build(PulsarZooKeeperClient.java:259)
at org.apache.pulsar.metadata.impl.ZKMetadataStore.<init>(ZKMetadataStore.java:100)
... 5 more
in Helm --debug --wait output:
ready.go:231: [debug] Job is not completed: pulsar/pulsar-luna-uswest1-staging-zookeeper-metadata
ready.go:231: [debug] Job is not completed: pulsar/pulsar-luna-uswest1-staging-zookeeper-metadata
ready.go:231: [debug] Job is not completed: pulsar/pulsar-luna-uswest1-staging-zookeeper-metadata
ready.go:231: [debug] Job is not completed: pulsar/pulsar-luna-uswest1-staging-zookeeper-metadata
ready.go:231: [debug] Job is not completed: pulsar/pulsar-luna-uswest1-staging-zookeeper-metadata
example error:
in Helm
--debug --wait
output: