apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.12k stars 3.57k forks source link

[Bug] Autorecovery pod crashes repeatedly while scaling up zookeeper #20900

Open pgier opened 1 year ago

pgier commented 1 year ago

Search before asking

Version

Running Pulsar 3.0.0 in Kubernetes

Minimal reproduce step

Install a basic pulsar cluster then scale up zookeeper.

What did you expect to see?

Maybe some reconnects in bookkeeper and autorecovery.

What did you see instead?

The autorecovery pod crashed and restarted while the zookeepers were scaling and bookkeeper was reconnecting.

2023-07-28T16:16:19,281+0000 [main] ERROR org.apache.bookkeeper.client.BookieWatcherImpl - Failed to get bookie list :
org.apache.bookkeeper.client.BKException$ZKException: Error while using ZooKeeper
    at org.apache.bookkeeper.discover.ZKRegistrationClient.lambda$getChildren$4(ZKRegistrationClient.java:352) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
    at org.apache.bookkeeper.zookeeper.ZooKeeperClient$25$1.processResult(ZooKeeperClient.java:1177) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:668) ~[org.apache.zookeeper-zookeeper-3.8.1.jar:3.8.1]
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:553) ~[org.apache.zookeeper-zookeeper-3.8.1.jar:3.8.1]
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /ledgers/available
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) ~[org.apache.zookeeper-zookeeper-3.8.1.jar:3.8.1]
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) ~[org.apache.zookeeper-zookeeper-3.8.1.jar:3.8.1]
    at org.apache.bookkeeper.discover.ZKRegistrationClient.lambda$getChildren$4(ZKRegistrationClient.java:351) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
    ... 3 more
2023-07-28T16:16:19,294+0000 [main] ERROR org.apache.bookkeeper.replication.AutoRecoveryMain - Failed to build AutoRecovery Server
java.io.IOException: Failed to create bookkeeper client
    at org.apache.bookkeeper.replication.Auditor.createBookKeeperClient(Auditor.java:105) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
    at org.apache.bookkeeper.replication.AutoRecoveryMain.<init>(AutoRecoveryMain.java:94) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
    at org.apache.bookkeeper.server.service.AutoRecoveryService.<init>(AutoRecoveryService.java:40) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
    at org.apache.bookkeeper.replication.AutoRecoveryMain.buildAutoRecoveryServer(AutoRecoveryMain.java:371) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
    at org.apache.bookkeeper.replication.AutoRecoveryMain.doMain(AutoRecoveryMain.java:339) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
    at org.apache.bookkeeper.replication.AutoRecoveryMain.main(AutoRecoveryMain.java:316) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
Caused by: org.apache.bookkeeper.client.BKException$ZKException: Error while using ZooKeeper
    at org.apache.bookkeeper.discover.ZKRegistrationClient.lambda$getChildren$4(ZKRegistrationClient.java:352) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
    at org.apache.bookkeeper.zookeeper.ZooKeeperClient$25$1.processResult(ZooKeeperClient.java:1177) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:668) ~[org.apache.zookeeper-zookeeper-3.8.1.jar:3.8.1]
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:553) ~[org.apache.zookeeper-zookeeper-3.8.1.jar:3.8.1]
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /ledgers/available
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) ~[org.apache.zookeeper-zookeeper-3.8.1.jar:3.8.1]
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) ~[org.apache.zookeeper-zookeeper-3.8.1.jar:3.8.1]
    at org.apache.bookkeeper.discover.ZKRegistrationClient.lambda$getChildren$4(ZKRegistrationClient.java:351) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
    at org.apache.bookkeeper.zookeeper.ZooKeeperClient$25$1.processResult(ZooKeeperClient.java:1177) ~[org.apache.bookkeeper-bookkeeper-server-4.16.1.jar:4.16.1]
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:668) ~[org.apache.zookeeper-zookeeper-3.8.1.jar:3.8.1]
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:553) ~[org.apache.zookeeper-zookeeper-3.8.1.jar:3.8.1]

Anything else?

No response

Are you willing to submit a PR?

github-actions[bot] commented 1 year ago

The issue had no activity for 30 days, mark with Stale label.