apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.25k stars 3.58k forks source link

[enhancement]Avoid consuming duplicated messages in geo-replication with ReplicateSubscriptionState enabled #19551

Open lujiwen opened 1 year ago

lujiwen commented 1 year ago

Search before asking

Motivation

Geo-replication feature helps us a lot to replicate messages from one cluster to another cluster geographically. However, duplicated messages can occur when the subscription state snapshot is not synchronized correctly between the source and target clusters. We are trying to solve this problem.

Solution

I found the snapshot will only be taken when the consumer acknowledges a message and advance the subscription cursor we have some snapshot in the cache whose messageId is older than the latest one. However, some latest cursor positions will not be taken into any snapshot and replicated to the remote cluster. To solve this problem, we want to set up a scheduled task to run once a second, which will sync the latest subscription cursor position even when the consumer failed and will not advance the cursor any further, but the broker is still alive.

Alternatives

No response

Anything else?

No response

Are you willing to submit a PR?

github-actions[bot] commented 1 year ago

The issue had no activity for 30 days, mark with Stale label.