apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
13.94k stars 3.53k forks source link

[Bug][broker] BrokerId npe when broker restart #22975

Closed TakaHiR07 closed 6 days ago

TakaHiR07 commented 1 week ago

Search before asking

Read release policy

Version

broker: 3.0.4 client: 2.9.5

Minimal reproduce step

client continue to produce and consume

broker restart

What did you expect to see?

no npe

What did you see instead?

2024-06-25T13:27:33,051+0800 [main] INFO  org.apache.pulsar.broker.service.BrokerService - Started Pulsar Broker service on /0.0.0.0:6651, TLS: (none), listener: (none)
2024-06-25T13:27:33,129+0800 [main] INFO  org.apache.pulsar.broker.service.BrokerService - Started Pulsar Broker service on /0.0.0.0:6751, TLS: OPENSSL, listener: (none)
2024-06-25T13:27:33,153+0800 [main] INFO  org.apache.pulsar.broker.service.BrokerService - Scheduling a thread to check backlog quota after [60] seconds in background
2024-06-25T13:27:33,155+0800 [main] INFO  org.apache.pulsar.broker.service.persistent.DispatchRateLimiter - configured broker message-dispatch rate DispatchRateImpl(dispatchThrottlingRateInMsg=0, dispatchThrottlingRateInByte=0, relativeToPublishRate=false, ratePeriodInSecond=1)
2024-06-25T13:27:33,155+0800 [main] INFO  org.apache.pulsar.broker.service.persistent.DispatchRateLimiter - setting message-dispatch-rate DispatchRateImpl(dispatchThrottlingRateInMsg=0, dispatchThrottlingRateInByte=0, relativeToPublishRate=false, ratePeriodInSecond=1)
2024-06-25T13:27:33,175+0800 [main] INFO  org.eclipse.jetty.util.log - Logging initialized @15239ms to org.eclipse.jetty.util.log.Slf4jLog
2024-06-25T13:27:33,175+0800 [pulsar-io-23-3] INFO  org.apache.pulsar.broker.service.ServerCnx - [/xxx:51004] connected with role=admin using authMethod=token, clientVersion=2.9.5, clientProtocolVersion=19, proxyVersion=null
2024-06-25T13:27:33,175+0800 [pulsar-io-23-6] INFO  org.apache.pulsar.broker.service.ServerCnx - [/xxx:51010] connected with role=admin using authMethod=token, clientVersion=2.9.5, clientProtocolVersion=19, proxyVersion=null
2024-06-25T13:27:33,175+0800 [pulsar-io-23-5] INFO  org.apache.pulsar.broker.service.ServerCnx - [/xxx:51008] connected with role=admin using authMethod=token, clientVersion=2.9.4, clientProtocolVersion=19, proxyVersion=null
2024-06-25T13:27:33,175+0800 [pulsar-io-23-1] INFO  org.apache.pulsar.broker.service.ServerCnx - [/xxx:51000] connected with role=admin using authMethod=token, clientVersion=2.9.5, clientProtocolVersion=19, proxyVersion=null
2024-06-25T13:27:33,175+0800 [pulsar-io-23-4] INFO  org.apache.pulsar.broker.service.ServerCnx - [/xxx:51006] connected with role=admin using authMethod=token, clientVersion=2.9.4, clientProtocolVersion=19, proxyVersion=null
2024-06-25T13:27:33,175+0800 [pulsar-io-23-2] INFO  org.apache.pulsar.broker.service.ServerCnx - [/xxx:51002] connected with role=admin using authMethod=token, clientVersion=2.9.5, clientProtocolVersion=19, proxyVersion=null
2024-06-25T13:27:33,191+0800 [pulsar-io-23-7] INFO  org.apache.pulsar.broker.service.ServerCnx - [/xxx:51012] connected with role=admin using authMethod=token, clientVersion=2.9.4, clientProtocolVersion=19, proxyVersion=null
2024-06-25T13:27:33,202+0800 [pulsar-io-23-4] INFO  org.apache.pulsar.broker.service.ServerCnx - [[id: 0xf8f4511f, L:/xxx:6651 - R:/xxx:51006]] Subscribing on topic xxx-partition-115 / debug_gateway. consumerId: 981
2024-06-25T13:27:33,202+0800 [pulsar-io-23-7] INFO  org.apache.pulsar.broker.service.ServerCnx - [[id: 0xb3980641, L:/xxx:6651 - R:/xxx:51012]] Subscribing on topic xxx-partition-115 / debug_gateway. consumerId: 1148
2024-06-25T13:27:33,202+0800 [pulsar-io-23-5] INFO  org.apache.pulsar.broker.service.ServerCnx - [[id: 0xd26bffc1, L:/xxx:6651 - R:/xxx:51008]] Subscribing on topic xxx-partition-129 / debug_gateway. consumerId: 1113
2024-06-25T13:27:33,242+0800 [pulsar-io-23-9] INFO  org.apache.pulsar.broker.service.ServerCnx - [/xxx:51014] connected with role=admin using authMethod=token, clientVersion=2.9.4, clientProtocolVersion=19, proxyVersion=null
2024-06-25T13:27:33,243+0800 [pulsar-io-23-9] INFO  org.apache.pulsar.broker.service.ServerCnx - [[id: 0x2e7cd8b2, L:/xxx:6651 - R:/xxx:51014]] Subscribing on topic xxx-partition-108 / debug_gateway. consumerId: 897
2024-06-25T13:27:33,378+0800 [metadata-store-29-1] ERROR org.apache.pulsar.broker.service.ServerCnx - [/xxx:51000] Failed to create topic xxx-partition-56, producerId=56

java.util.concurrent.CompletionException: java.lang.NullPointerException: brokerId is not initialized before start has been called
        at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:368) ~[?:?]
        at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:377) ~[?:?]
        at java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1097) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[?:?]
        at org.apache.pulsar.broker.service.BrokerService.lambda$loadOrCreatePersistentTopic$59(BrokerService.java:1607) ~[org.apache.pulsar-pulsar-broker-3.0.4.1.jar:3.0.4.1]
        at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:990) ~[?:?]
        at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:974) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
        at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147) ~[?:?]
        at org.apache.pulsar.common.naming.NamespaceBundleFactory.lambda$doLoadBundles$2(NamespaceBundleFactory.java:107) ~[org.apache.pulsar-pulsar-broker-3.0.4.1.jar:3.0.4.1]
        at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:718) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
        at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147) ~[?:?]
        at org.apache.pulsar.metadata.impl.ZKMetadataStore.handleGetResult(ZKMetadataStore.java:275) ~[org.apache.pulsar-pulsar-metadata-3.0.4.1.jar:3.0.4.1]
        at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$batchOperation$5(ZKMetadataStore.java:219) ~[org.apache.pulsar-pulsar-metadata-3.0.4.1.jar:3.0.4.1]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.105.Final.jar:4.1.105.Final]
        at java.lang.Thread.run(Thread.java:833) ~[?:?]
Caused by: java.lang.NullPointerException: brokerId is not initialized before start has been called
        at java.util.Objects.requireNonNull(Objects.java:233) ~[?:?]
        at org.apache.pulsar.broker.PulsarService.getBrokerId(PulsarService.java:1804) ~[org.apache.pulsar-pulsar-broker-3.0.4.1.jar:3.0.4.1]
        at org.apache.pulsar.broker.service.BrokerService.lambda$checkTopicNsOwnership$100(BrokerService.java:2231) ~[org.apache.pulsar-pulsar-broker-3.0.4.1.jar:3.0.4.1]
        at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1150) ~[?:?]
        ... 15 more

Anything else?

No response

Are you willing to submit a PR?

lhotari commented 1 week ago

This is a regression caused by #21894 which was released in versions 3.2.0, 3.0.3 and 3.1.3. However, the root cause existed already before that change. The fix in PR #22977 will address the root cause. The root cause is that the broker starts serving requests before it is ready.