The backup-cluster may doing an incomplete learn with duplication

Bug Report

At present, the implementation of dup is that when the backup-cluster executes the dup rpc processing function, multiple requests in dup are written to rocksdb in multiple times. Each time it is written to rocksdb, the decree of the dup mutation is written at the same time. If the backup-cluster is checkpointed at this time, the data of the decree may not be completely written to rocksdb. If the learner of the backup-cluster uses this checkpoint to start learning, it will start to request plog from decree+1 after learning. As a result, some dup requests of the decree are not learned, and some data is lost.

int pegasus_write_service::duplicate(int64_t decree,
                                     const dsn::apps::duplicate_request &requests,
                                     dsn::apps::duplicate_response &resp)
{
    // If the `for` loop has not yet been completed, and there is a need to checkpoint.
    // The checkpoint may not include all data cause these request share the same decree.
    // In other word, this creates an inconsistency.
    for (const auto &request : requests.entries) {
    // ...
     }
}

apache / incubator-pegasus

The backup-cluster may doing an incomplete learn with duplication #2107

Bug Report