apache / incubator-pegasus

Apache Pegasus - A horizontally scalable, strongly consistent and high-performance key-value store
https://pegasus.apache.org/
Apache License 2.0
1.98k stars 313 forks source link

The backup-cluster may doing an incomplete learn with duplication #2107

Open ninsmiracle opened 2 months ago

ninsmiracle commented 2 months ago

Bug Report

At present, the implementation of dup is that when the backup-cluster executes the dup rpc processing function, multiple requests in dup are written to rocksdb in multiple times. Each time it is written to rocksdb, the decree of the dup mutation is written at the same time. If the backup-cluster is checkpointed at this time, the data of the decree may not be completely written to rocksdb. If the learner of the backup-cluster uses this checkpoint to start learning, it will start to request plog from decree+1 after learning. As a result, some dup requests of the decree are not learned, and some data is lost.

int pegasus_write_service::duplicate(int64_t decree,
                                     const dsn::apps::duplicate_request &requests,
                                     dsn::apps::duplicate_response &resp)
{
    // If the `for` loop has not yet been completed, and there is a need to checkpoint.
    // The checkpoint may not include all data cause these request share the same decree.
    // In other word, this creates an inconsistency.
    for (const auto &request : requests.entries) {
    // ...
     }
}
ninsmiracle commented 2 months ago

But we can use duplicate_log_batch_bytes = 0 to deal with this problem. So I'm not very sure should I fix this 'bug'. If I should fix it, executing a dup request should write multiple requests and one decree as a write_batch? @acelyc111 @empiredan