[X] I searched in the issues and found nothing similar.
Version
Running Pulsar 3.0.0 in Kubernetes
Minimal reproduce step
Install a basic pulsar cluster then scale up zookeeper.
What did you expect to see?
Maybe some reconnects in bookkeeper and autorecovery.
What did you see instead?
The autorecovery pod crashed and restarted while the zookeepers were scaling and bookkeeper was reconnecting.
2023-07-28T16:16:19,281+0000 [main] ERROR org.apache.bookkeeper.client.BookieWatcherImpl - Failed to get bookie list :
org.apache.bookkeeper.client.BKException$ZKException: Error while using ZooKeeper
at org.apache.bookkeeper.discover.ZKRegistrationClient.lambda$getChildren$4(ZKRegistrationClient.java:352) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
at org.apache.bookkeeper.zookeeper.ZooKeeperClient$25$1.processResult(ZooKeeperClient.java:1177) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:668) ~[org.apache.zookeeper-zookeeper-3.8.1.jar:3.8.1]
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:553) ~[org.apache.zookeeper-zookeeper-3.8.1.jar:3.8.1]
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /ledgers/available
at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) ~[org.apache.zookeeper-zookeeper-3.8.1.jar:3.8.1]
at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) ~[org.apache.zookeeper-zookeeper-3.8.1.jar:3.8.1]
at org.apache.bookkeeper.discover.ZKRegistrationClient.lambda$getChildren$4(ZKRegistrationClient.java:351) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
... 3 more
2023-07-28T16:16:19,294+0000 [main] ERROR org.apache.bookkeeper.replication.AutoRecoveryMain - Failed to build AutoRecovery Server
java.io.IOException: Failed to create bookkeeper client
at org.apache.bookkeeper.replication.Auditor.createBookKeeperClient(Auditor.java:105) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
at org.apache.bookkeeper.replication.AutoRecoveryMain.<init>(AutoRecoveryMain.java:94) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
at org.apache.bookkeeper.server.service.AutoRecoveryService.<init>(AutoRecoveryService.java:40) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
at org.apache.bookkeeper.replication.AutoRecoveryMain.buildAutoRecoveryServer(AutoRecoveryMain.java:371) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
at org.apache.bookkeeper.replication.AutoRecoveryMain.doMain(AutoRecoveryMain.java:339) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
at org.apache.bookkeeper.replication.AutoRecoveryMain.main(AutoRecoveryMain.java:316) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
Caused by: org.apache.bookkeeper.client.BKException$ZKException: Error while using ZooKeeper
at org.apache.bookkeeper.discover.ZKRegistrationClient.lambda$getChildren$4(ZKRegistrationClient.java:352) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
at org.apache.bookkeeper.zookeeper.ZooKeeperClient$25$1.processResult(ZooKeeperClient.java:1177) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:668) ~[org.apache.zookeeper-zookeeper-3.8.1.jar:3.8.1]
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:553) ~[org.apache.zookeeper-zookeeper-3.8.1.jar:3.8.1]
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /ledgers/available
at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) ~[org.apache.zookeeper-zookeeper-3.8.1.jar:3.8.1]
at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) ~[org.apache.zookeeper-zookeeper-3.8.1.jar:3.8.1]
at org.apache.bookkeeper.discover.ZKRegistrationClient.lambda$getChildren$4(ZKRegistrationClient.java:351) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
at org.apache.bookkeeper.zookeeper.ZooKeeperClient$25$1.processResult(ZooKeeperClient.java:1177) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:668) ~[org.apache.zookeeper-zookeeper-3.8.1.jar:3.8.1]
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:553) ~[org.apache.zookeeper-zookeeper-3.8.1.jar:3.8.1]
Search before asking
Version
Running Pulsar 3.0.0 in Kubernetes
Minimal reproduce step
Install a basic pulsar cluster then scale up zookeeper.
What did you expect to see?
Maybe some reconnects in bookkeeper and autorecovery.
What did you see instead?
The autorecovery pod crashed and restarted while the zookeepers were scaling and bookkeeper was reconnecting.
Anything else?
No response
Are you willing to submit a PR?