datastax / pulsar-helm-chart

Apache Pulsar Helm chart
Apache License 2.0
46 stars 38 forks source link

Add timeout for running "bin/pulsar initialize-cluster-metadata" command #234

Closed lhotari closed 2 years ago

lhotari commented 2 years ago

example error:

2022-06-13T11:33:42,814+0000 [main-SendThread(pulsar-luna-uswest1-staging-zookeeper-ca.pulsar.svc.cluster.local:2181)] WARN  org.apache.zookeeper.ClientCnxn - An exception was thrown while closing send thread for session 0x0.
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777) ~[?:?]
    at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:344) ~[org.apache.zookeeper-zookeeper-3.8.0.jar:3.8.0]
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1282) [org.apache.zookeeper-zookeeper-3.8.0.jar:3.8.0]
2022-06-13T11:33:42,921+0000 [main] INFO  org.apache.zookeeper.ZooKeeper - Session: 0x0 closed
2022-06-13T11:33:42,921+0000 [main-EventThread] INFO  org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x0
Exception in thread "main" org.apache.pulsar.metadata.api.MetadataStoreException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
    at org.apache.pulsar.metadata.impl.ZKMetadataStore.<init>(ZKMetadataStore.java:108)
    at org.apache.pulsar.metadata.impl.MetadataStoreFactoryImpl.newInstance(MetadataStoreFactoryImpl.java:56)
    at org.apache.pulsar.metadata.impl.MetadataStoreFactoryImpl.createExtended(MetadataStoreFactoryImpl.java:36)
    at org.apache.pulsar.metadata.api.extended.MetadataStoreExtended.create(MetadataStoreExtended.java:40)
    at org.apache.pulsar.PulsarClusterMetadataSetup.initMetadataStore(PulsarClusterMetadataSetup.java:380)
    at org.apache.pulsar.PulsarClusterMetadataSetup.main(PulsarClusterMetadataSetup.java:238)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
    at org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase.waitForConnection(ZooKeeperWatcherBase.java:159)
    at org.apache.pulsar.metadata.impl.PulsarZooKeeperClient$Builder.build(PulsarZooKeeperClient.java:259)
    at org.apache.pulsar.metadata.impl.ZKMetadataStore.<init>(ZKMetadataStore.java:100)
    ... 5 more

in Helm --debug --wait output:

ready.go:231: [debug] Job is not completed: pulsar/pulsar-luna-uswest1-staging-zookeeper-metadata
ready.go:231: [debug] Job is not completed: pulsar/pulsar-luna-uswest1-staging-zookeeper-metadata
ready.go:231: [debug] Job is not completed: pulsar/pulsar-luna-uswest1-staging-zookeeper-metadata
ready.go:231: [debug] Job is not completed: pulsar/pulsar-luna-uswest1-staging-zookeeper-metadata
ready.go:231: [debug] Job is not completed: pulsar/pulsar-luna-uswest1-staging-zookeeper-metadata
lhotari commented 2 years ago

The PR https://github.com/apache/pulsar/pull/16039 created upstream to address the problem when the job gets stuck.