Closed massakam closed 1 year ago
Closed as stale. Please create a new issue if it's still relevant to the maintained versions.
I have 2 cluster setup, and I have enabled bi-directional geo-replication and I am facing the same issue mentioned above. Is there any update regarding same?
Recently, the number of messages in the replication backlog for a particular topic has become very large.
This topic is replicated on two clusters, and all producers and consumers are connected to only one cluster. The strange thing is that the replication backlog is larger in the cluster where no producer and consumer are connected. The following is the stats of the topic in that cluster.
Notable is the
"connected": false
part. Since this topic is not active (no producer or consumer) in this cluster, it is seems that the replicator has been closed by topic GC.I think the cause of this issue is that the replicator throttles reading entries while the producer for geo-replication is closed. If the publish rate of messages is high, reading entries by the replicator will not keep up with message publishing and the replication backlog will increase. https://github.com/apache/pulsar/blob/v2.3.2/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentReplicator.java#L155-L162
It is reasonable to throttle reading of messages published to the local cluster while the producer for geo-replication is closed. However, there is no need to throttle reading messages replicated from other clusters. The replicator discards these messages and does not send them using the producer. https://github.com/apache/pulsar/blob/v2.3.2/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentReplicator.java#L226-L232