apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.25k stars 3.58k forks source link

[Bug] Delayed Message cursor lost after topic unload #22716

Open SennoYuki opened 6 months ago

SennoYuki commented 6 months ago

Search before asking

Read release policy

Version

OS: linux 3.10.0 Java: jdk17 pulsar: 3.2.0

Minimal reproduce step

  1. produce delayed message to a topic
  2. restart topic owner broker

What did you expect to see?

the message consume keeps normal

What did you see instead?

The delayed message delivery immediately

Anything else?

No response

Are you willing to submit a PR?

SennoYuki commented 6 months ago

And I had already used config delayedDeliveryTrackerFactoryClassName=org.apache.pulsar.broker.delayed.BucketDelayedDeliveryTrackerFactory

dao-jun commented 6 months ago

I need more context:

  1. What's your subscription type? Exclusive,Failover,Shared or Key_Shared?
  2. Did you enable delayedDeliveryEnabled ?
SennoYuki commented 6 months ago

I need more context:

  1. What's your subscription type? Exclusive,Failover,Shared or Key_Shared?
  2. Did you enable delayedDeliveryEnabled ?
  1. Shared
  2. enable
SennoYuki commented 6 months ago

And I discover my subscription cursor may not be durable. But I confirm I used durable subscription from consumer config and pulsar-admin topic stats 20240517-110607

SennoYuki commented 6 months ago

I need more context:

  1. What's your subscription type? Exclusive,Failover,Shared or Key_Shared?
  2. Did you enable delayedDeliveryEnabled ?

If possible, maybe we can check this problem with IM software like wechat for efficiency.

dao-jun commented 6 months ago

I don't reproduce the issue, and I'll keep trying

For determine if a subscription is durable, try run ./pulsar-admin topics stats persistent://public/default/my-topic1, and see the isDurable field is true or false

SennoYuki commented 6 months ago

I don't reproduce the issue, and I'll keep trying

For determine if a subscription is durable, try run ./pulsar-admin topics stats persistent://public/default/my-topic1, and see the isDurable field is true or false It's true "isDurable" : true, "isReplicated" : false, "allowOutOfOrderDelivery" : false,

dao-jun commented 6 months ago

If isDurable is true, it means your cursor is durable

SennoYuki commented 6 months ago

If isDurable is true, it means your cursor is durable

So I can't fix the situation. I can't restart the broker to tuning config because of duplicated consume happening after restarting

dao-jun commented 6 months ago

@coderzc I cannot reproduce the issue on the master branch, could you please take a look?

SennoYuki commented 5 months ago

@dao-jun @coderzc Any update? Today I reproduce the problem with topic unload operation. Can you check the unload code?

dao-jun commented 5 months ago

@SennoYuki Can you please provide broker log and heap dump?

SennoYuki commented 5 months ago

@dao-jun unload log

5月 21 02:13:04 sg-prod-server-pulsar-3 pulsar[14996]: 2024-05-21T02:13:04,493+0000 [pulsar-web-42-3] INFO  org.apache.pulsar.broker.admin.impl.PersistentTopicsBase - [null] Unloading topic persistent://public/default/push_al_queue  
5月 21 02:13:04 sg-prod-server-pulsar-3 pulsar[14996]: 2024-05-21T02:13:04,511+0000 [metadata-store-9-1] INFO  org.eclipse.jetty.server.RequestLog - 127.0.0.1 - - [21/May/2024:02:13:04 +0000] "PUT /admin/v2/persistent/public/default/push_al_queue/unload HTTP/1.1" 307 0 "-" "Pulsar-Java-v3.2.3" 21
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,538+0000 [pulsar-web-42-3] INFO  org.apache.pulsar.broker.admin.impl.PersistentTopicsBase - [null] Unloading topic persistent://public/default/push_al_queue
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,540+0000 [metadata-store-9-1] INFO  org.apache.pulsar.broker.service.Producer - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent://public/default/push_al_queue}, client=/172.17.32.36:56858, producerName=server-pulsar-10-5, producerId=1}, assignedBrokerLookupData: Optional.empty
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,540+0000 [metadata-store-9-1] INFO  org.apache.pulsar.broker.service.Producer - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent://public/default/push_al_queue}, client=/172.17.32.22:59572, producerName=server-pulsar-10-4, producerId=1}, assignedBrokerLookupData: Optional.empty
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,541+0000 [metadata-store-9-1] INFO  org.apache.pulsar.broker.service.Consumer - Disconnecting consumer: Consumer{subscription=PersistentSubscription{topic=persistent://public/default/push_al_queue, name=kafka_to_bk}, consumerId=1, consumerName=eyqqe, address=/172.17.32.22:53160}
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,541+0000 [metadata-store-9-1] INFO  org.apache.pulsar.broker.service.persistent.PersistentDispatcherMultipleConsumers - Removed consumer Consumer{subscription=PersistentSubscription{topic=persistent://public/default/push_al_queue, name=kafka_to_bk}, consumerId=1, consumerName=eyqqe, address=/172.17.32.22:53160} with pending 0 acks
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,541+0000 [metadata-store-9-1] INFO  org.apache.pulsar.broker.service.Consumer - Disconnecting consumer: Consumer{subscription=PersistentSubscription{topic=persistent://public/default/push_al_queue, name=kafka_to_bk}, consumerId=1, consumerName=oypam, address=/172.17.32.36:56860}
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,541+0000 [metadata-store-9-1] INFO  org.apache.pulsar.broker.service.persistent.PersistentDispatcherMultipleConsumers - Removed consumer Consumer{subscription=PersistentSubscription{topic=persistent://public/default/push_al_queue, name=kafka_to_bk}, consumerId=1, consumerName=oypam, address=/172.17.32.36:56860} with pending 0 acks
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,542+0000 [metadata-store-9-1] INFO  org.apache.pulsar.broker.service.persistent.PersistentDispatcherMultipleConsumers - [persistent://public/default/push_al_queue / kafka_to_bk] All consumers removed. Subscription is disconnected
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,542+0000 [metadata-store-9-1] INFO  org.apache.pulsar.broker.service.persistent.PersistentSubscription - [persistent://public/default/push_al_queue][kafka_to_bk] Successfully closed subscription [ManagedCursorImpl{ledger=public/default/persistent/push_al_queue, name=kafka_to_bk, ackPos=2384:81800, readPos=3188:98332}]
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,542+0000 [metadata-store-9-1] INFO  org.apache.pulsar.broker.service.persistent.PersistentSubscription - [persistent://public/default/push_al_queue][kafka_to_bk] Successfully closed the subscription
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,542+0000 [metadata-store-9-1] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/default/persistent/push_al_queue] Closing managed ledger
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,601+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/push_al_queue][kafka_to_bk] Updated md-position=2384:81800 into cursor-ledger 3149
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,601+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/push_al_queue] [kafka_to_bk] Closing metadata ledger 3149
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,604+0000 [main-EventThread] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/push_al_queue][kafka_to_bk] Closed cursor-ledger 3149
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,604+0000 [main-EventThread] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/push_al_queue] Topic closed
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,604+0000 [main-EventThread] INFO  org.apache.pulsar.broker.admin.impl.PersistentTopicsBase - [null] Successfully unloaded topic persistent://public/default/push_al_queue
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,604+0000 [main-EventThread] INFO  org.eclipse.jetty.server.RequestLog - 172.17.4.143 - - [21/May/2024:02:13:04 +0000] "PUT /admin/v2/persistent/public/default/push_al_queue/unload?authoritative=true HTTP/1.1" 204 0 "-" "Pulsar-Java-v3.2.3" 69
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,648+0000 [pulsar-io-3-6] INFO  org.apache.pulsar.broker.service.ServerCnx - [[id: 0x213c9851, L:/172.17.4.22:6650 - R:/172.17.32.22:53160]] Subscribing on topic persistent://public/default/push_al_queue / kafka_to_bk. consumerId: 1
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,655+0000 [pulsar-2-3] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - Opening managed ledger public/default/persistent/push_al_queue
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,659+0000 [pulsar-io-3-7] INFO  org.apache.pulsar.broker.service.ServerCnx - [[id: 0xc1449f7e, L:/172.17.4.22:6650 - R:/172.17.32.36:56860]] Subscribing on topic persistent://public/default/push_al_queue / kafka_to_bk. consumerId: 1
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,660+0000 [BookKeeperClientWorker-OrderedExecutor-3-0] WARN  org.apache.bookkeeper.client.BookieWatcherImpl - New ensemble: [172.17.4.143:3181, 172.17.0.48:3181] is not adhering to Placement Policy. quarantinedBookies: []
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,666+0000 [main-EventThread] INFO  org.apache.bookkeeper.client.LedgerCreateOp - Ensemble: [172.17.4.143:3181, 172.17.0.48:3181] for ledger: 3192
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,666+0000 [BookKeeperClientWorker-OrderedExecutor-3-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/default/persistent/push_al_queue] Created ledger 3192 after closed null
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,674+0000 [bookkeeper-ml-scheduler-OrderedScheduler-3-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/default/persistent/push_al_queue] Loading cursor kafka_to_bk
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,674+0000 [bookkeeper-ml-scheduler-OrderedScheduler-3-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/push_al_queue] Recovering from bookkeeper ledger cursor: kafka_to_bk
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,679+0000 [bookkeeper-ml-scheduler-OrderedScheduler-3-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/push_al_queue] Cursor kafka_to_bk meta-data recover from ledger 3149
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,680+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/push_al_queue] Opened ledger 3149 for cursor kafka_to_bk. rc=0
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,699+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/push_al_queue] Cursor kafka_to_bk recovered to position 2384:81800
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,699+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [public/default/persistent/push_al_queue] Recovery for cursor kafka_to_bk completed. pos=2384:81800 -- todo=0
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,699+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerFactoryImpl - [public/default/persistent/push_al_queue] Successfully initialize managed ledger
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,700+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/push_al_queue] Disabled replicated subscriptions controller
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,700+0000 [broker-topic-workers-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.service.BrokerService - Created topic persistent://public/default/push_al_queue - dedup is disabled
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,701+0000 [broker-topic-workers-OrderedExecutor-1-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/push_al_queue-kafka_to_bk] Rewind from 2384:81801 to 2384:81801
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,701+0000 [broker-topic-workers-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/push_al_queue] Disabled replicated subscriptions controller
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,701+0000 [broker-topic-workers-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.service.persistent.PersistentSubscription - backlog for persistent://public/default/push_al_queue - 41793949
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,701+0000 [broker-topic-workers-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.service.ServerCnx - [/172.17.32.36:56860] Created subscription on topic persistent://public/default/push_al_queue / kafka_to_bk
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,701+0000 [broker-topic-workers-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/push_al_queue] Disabled replicated subscriptions controller
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,701+0000 [broker-topic-workers-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.service.persistent.PersistentSubscription - backlog for persistent://public/default/push_al_queue - 41793949
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,701+0000 [broker-topic-workers-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.service.ServerCnx - [/172.17.32.22:53160] Created subscription on topic persistent://public/default/push_al_queue / kafka_to_bk
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,705+0000 [broker-topic-workers-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.service.ServerCnx - [/172.17.32.22:59572] Created new producer: Producer{topic=PersistentTopic{topic=persistent://public/default/push_al_queue}, client=/172.17.32.22:59572, producerName=server-pulsar-10-4, producerId=1}
5月 21 02:13:04 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:04,705+0000 [broker-topic-workers-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.service.ServerCnx - [/172.17.32.36:56858] Created new producer: Producer{topic=PersistentTopic{topic=persistent://public/default/push_al_queue}, client=/172.17.32.36:56858, producerName=server-pulsar-10-5, producerId=1}
5月 21 02:13:05 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:05,231+0000 [broker-topic-workers-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Recover delayed message index bucket snapshot finish, buckets: 170, numberDelayedMessages: 7520012
5月 21 02:13:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:06,656+0000 [pulsar-io-3-7] WARN  org.apache.bookkeeper.client.BookieWatcherImpl - New ensemble: [172.17.0.48:3181, 172.17.4.143:3181] is not adhering to Placement Policy. quarantinedBookies: []
5月 21 02:13:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:06,662+0000 [main-EventThread] INFO  org.apache.bookkeeper.client.LedgerCreateOp - Ensemble: [172.17.0.48:3181, 172.17.4.143:3181] for ledger: 3193
5月 21 02:13:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:06,694+0000 [bookkeeper-ml-scheduler-OrderedScheduler-3-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/push_al_queue] Updated cursor kafka_to_bk with ledger id 3193 md-position=2384:81800 rd-position=2384:81802
5月 21 02:13:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:06,699+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/push_al_queue][kafka_to_bk] Successfully closed & deleted ledger 3149 in cursor
5月 21 02:13:07 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:07,476+0000 [pulsar-io-3-6] INFO  org.apache.pulsar.broker.service.ServerCnx - [/172.17.32.135:55076] connected with clientVersion=Pulsar Go v0.12.1, clientProtocolVersion=18, proxyVersion=null
5月 21 02:13:17 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:17,946+0000 [pulsar-io-3-1] INFO  org.apache.pulsar.broker.service.ServerCnx - Closed connection from /172.17.32.75:51856
5月 21 02:13:23 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:23,873+0000 [prometheus-stats-43-4] INFO  org.eclipse.jetty.server.RequestLog - 172.17.225.89 - - [21/May/2024:02:13:23 +0000] "GET /metrics/ HTTP/1.1" 200 7807 "-" "Prometheus/2.41.0" 8
5月 21 02:13:34 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:34,126+0000 [pulsar-io-3-2] INFO  org.apache.pulsar.broker.service.ServerCnx - Closed connection from /172.17.33.84:39704
5月 21 02:13:39 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:39,074+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_3053_3053, segmentEntryId: 58
5月 21 02:13:39 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:39,902+0000 [pulsar-io-3-5] INFO  org.apache.pulsar.broker.service.ServerCnx - Closed connection from /172.17.32.116:44098
5月 21 02:13:44 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:44,236+0000 [pulsar-io-3-7] INFO  org.apache.pulsar.broker.service.ServerCnx - Closed connection from /172.17.32.89:44170
5月 21 02:13:49 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:49,279+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_3044_3044, segmentEntryId: 58
5月 21 02:13:53 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:53,873+0000 [prometheus-stats-43-3] INFO  org.eclipse.jetty.server.RequestLog - 172.17.225.89 - - [21/May/2024:02:13:53 +0000] "GET /metrics/ HTTP/1.1" 200 7942 "-" "Prometheus/2.41.0" 9
5月 21 02:13:58 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:58,444+0000 [pulsar-load-manager-1-1] INFO  org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl - Writing local data to metadata store because maximum change 3796.7018423721024% exceeded threshold 10%; time since last report written is 115.0 seconds
5月 21 02:13:58 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:13:58,452+0000 [metadata-store-9-1] INFO  org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl - Acquired resource lock on /loadbalance/brokers/sg-prod-server-pulsar-1.stos.org:8080
5月 21 02:14:01 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:01,921+0000 [prometheus-stats-43-4] INFO  org.eclipse.jetty.server.RequestLog - 127.0.0.1 - - [21/May/2024:02:14:01 +0000] "GET /metrics/ HTTP/1.1" 200 102780 "-" "curl/7.29.0" 4
5月 21 02:14:01 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:01,992+0000 [prometheus-stats-43-1] INFO  org.eclipse.jetty.server.RequestLog - 127.0.0.1 - - [21/May/2024:02:14:01 +0000] "GET /metrics/ HTTP/1.1" 200 102779 "-" "curl/7.29.0" 4
5月 21 02:14:03 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:03,444+0000 [pulsar-load-manager-1-1] INFO  org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl - Writing local data to metadata store because maximum change 25.37202090024948% exceeded threshold 10%; time since last report written is 5.0 seconds
5月 21 02:14:03 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:03,452+0000 [metadata-store-9-1] INFO  org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl - Acquired resource lock on /loadbalance/brokers/sg-prod-server-pulsar-1.stos.org:8080
5月 21 02:14:08 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:08,496+0000 [pulsar-io-3-6] INFO  org.apache.pulsar.broker.service.ServerCnx - Closed connection from /172.17.32.135:55076
5月 21 02:14:08 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:08,511+0000 [BookKeeperClientWorker-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_3062_3062, segmentEntryId: 32
5月 21 02:14:17 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:17,201+0000 [pulsar-io-3-1] INFO  org.apache.pulsar.broker.service.ServerCnx - [/172.17.32.75:57072] connected with clientVersion=Pulsar Go v0.12.1, clientProtocolVersion=18, proxyVersion=null
5月 21 02:14:23 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:23,873+0000 [prometheus-stats-43-2] INFO  org.eclipse.jetty.server.RequestLog - 172.17.225.89 - - [21/May/2024:02:14:23 +0000] "GET /metrics/ HTTP/1.1" 200 8165 "-" "Prometheus/2.41.0" 8
5月 21 02:14:37 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:37,064+0000 [pulsar-io-3-4] INFO  org.apache.pulsar.broker.service.ServerCnx - Closed connection from /172.17.32.112:54732
5月 21 02:14:45 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:45,345+0000 [BookKeeperClientWorker-OrderedExecutor-3-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_3065_3065, segmentEntryId: 34
5月 21 02:14:49 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:49,336+0000 [pulsar-io-3-4] INFO  org.apache.pulsar.broker.service.ServerCnx - [/172.17.32.122:60590] connected with clientVersion=Pulsar Go v0.12.1, clientProtocolVersion=18, proxyVersion=null
5月 21 02:14:53 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:53,871+0000 [prometheus-stats-43-1] INFO  org.eclipse.jetty.server.RequestLog - 172.17.225.89 - - [21/May/2024:02:14:53 +0000] "GET /metrics/ HTTP/1.1" 200 8160 "-" "Prometheus/2.41.0" 6
5月 21 02:14:54 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:54,044+0000 [BookKeeperClientWorker-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_3078_3078, segmentEntryId: 45
5月 21 02:14:55 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:55,541+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_3084_3084, segmentEntryId: 49
5月 21 02:14:58 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:58,444+0000 [pulsar-load-manager-1-1] INFO  org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl - Writing local data to metadata store because maximum change 11.665924417968807% exceeded threshold 10%; time since last report written is 55.0 seconds
5月 21 02:14:58 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:58,452+0000 [metadata-store-9-1] INFO  org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl - Acquired resource lock on /loadbalance/brokers/sg-prod-server-pulsar-1.stos.org:8080
5月 21 02:14:59 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:59,049+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2855_2855, segmentEntryId: 137
5月 21 02:14:59 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:59,207+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2891_2891, segmentEntryId: 142
5月 21 02:14:59 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:59,222+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2853_2853, segmentEntryId: 128
5月 21 02:14:59 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:59,257+0000 [BookKeeperClientWorker-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2858_2858, segmentEntryId: 148
5月 21 02:14:59 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:14:59,273+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2888_2888, segmentEntryId: 141
5月 21 02:15:00 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:00,680+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_3165_3165, segmentEntryId: 10
5月 21 02:15:00 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:00,830+0000 [BookKeeperClientWorker-OrderedExecutor-3-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_3162_3162, segmentEntryId: 11
5月 21 02:15:01 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:01,047+0000 [pulsar-io-3-2] INFO  org.apache.pulsar.broker.service.ServerCnx - Closed connection from /172.17.32.62:35650
5月 21 02:15:01 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:01,052+0000 [prometheus-stats-43-2] INFO  org.eclipse.jetty.server.RequestLog - 127.0.0.1 - - [21/May/2024:02:15:01 +0000] "GET /metrics/ HTTP/1.1" 200 103142 "-" "curl/7.29.0" 11
5月 21 02:15:01 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:01,091+0000 [prometheus-stats-43-3] INFO  org.eclipse.jetty.server.RequestLog - 127.0.0.1 - - [21/May/2024:02:15:01 +0000] "GET /metrics/ HTTP/1.1" 200 102687 "-" "curl/7.29.0" 4
5月 21 02:15:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:06,134+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2888_2888, segmentEntryId: 142
5月 21 02:15:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:06,283+0000 [BookKeeperClientWorker-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2895_2895, segmentEntryId: 136
5月 21 02:15:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:06,296+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2885_2885, segmentEntryId: 126
5月 21 02:15:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:06,312+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2853_2853, segmentEntryId: 129
5月 21 02:15:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:06,335+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2860_2860, segmentEntryId: 149
5月 21 02:15:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:06,361+0000 [BookKeeperClientWorker-OrderedExecutor-3-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2898_2898, segmentEntryId: 136
5月 21 02:15:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:06,384+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2855_2855, segmentEntryId: 138
5月 21 02:15:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:06,400+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2891_2891, segmentEntryId: 143
5月 21 02:15:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:06,415+0000 [BookKeeperClientWorker-OrderedExecutor-3-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2862_2862, segmentEntryId: 150
5月 21 02:15:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:06,428+0000 [BookKeeperClientWorker-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2857_2857, segmentEntryId: 141
5月 21 02:15:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:06,440+0000 [BookKeeperClientWorker-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2858_2858, segmentEntryId: 149
5月 21 02:15:06 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:06,635+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_3081_3081, segmentEntryId: 57
5月 21 02:15:17 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:17,539+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_3183_3183, segmentEntryId: 2
5月 21 02:15:22 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:22,204+0000 [BookKeeperClientWorker-OrderedExecutor-1-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_2975_2975, segmentEntryId: 72
5月 21 02:15:23 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:23,873+0000 [prometheus-stats-43-4] INFO  org.eclipse.jetty.server.RequestLog - 172.17.225.89 - - [21/May/2024:02:15:23 +0000] "GET /metrics/ HTTP/1.1" 200 8144 "-" "Prometheus/2.41.0" 9
5月 21 02:15:23 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:23,977+0000 [BookKeeperClientWorker-OrderedExecutor-3-0] INFO  org.apache.pulsar.broker.delayed.bucket.BucketDelayedDeliveryTracker - [persistent://public/default/push_al_queue / kafka_to_bk] Load next bucket snapshot segment finish, bucketKey: #pulsar.internal.delayed.bucket_3186_3186, segmentEntryId: 2
5月 21 02:15:24 sg-prod-server-pulsar-1 pulsar[19820]: 2024-05-21T02:15:24,556+0000 [pulsar-io-3-2] INFO  org.apache.pulsar.broker.service.ServerCnx - [/172.17.32.50:56158] connected with clientVersion=Pulsar Go v0.12.1, clientProtocolVersion=18, proxyVersion=null

And metrics change image image

Sorry, maybe I cannot provide heap dump because of data safety

SennoYuki commented 5 months ago

https://github.com/apache/pulsar/issues/20638 https://github.com/apache/pulsar/issues/20099

Other people encountered similar problems. After cursor recovery, it seems that the cursor is simply reset to the position of the last ack submission. This will cause a large number of messages to be consumed repeatedly in the delay queue.

dao-jun commented 5 months ago

see also https://github.com/apache/pulsar/issues/20099#issuecomment-2123050102