apache / couchdb

Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
https://couchdb.apache.org/
Apache License 2.0
6.24k stars 1.03k forks source link

Fast-forward replication through transitive checkpoint analysis #3675

Open kocolosk opened 3 years ago

kocolosk commented 3 years ago

Summary

I'd like to be able to choose the starting sequence for a replication between a given source and target using more information than just the replication history between those two databases. Specifically, I'd like to be able to use other replication checkpoint histories to discover transitive relationships that could be used to accelerate the first replication between CouchDB databases that share a common peer.

Desired Behaviour

It might be simplest to provide an example. Consider a system where you have a pair of cloud sites (call them us-east and us-west) and a series of edge locations (e.g. store1):

In the current version of CouchDB, the us-west -> store1 replication will start from 0 because those peers have no replication history between them. Going forward, it would be useful for us to recognize that us-west -> us-east has a history, and us-east -> store1 has a history, so we can fast-forward us-west -> store1 by analyzing the pair of those checkpoint histories to discover the maximum sequence on us-west guaranteed to have been observed on store1 (by way of us-east).

Possible Solution

I believe we actually already employ this transitive analysis for fast-forwarding internal replications between shard copies in a cluster, so we may be able to refactor some of that code to apply it more generally.

I'm not sure if we track the target sequence in the current external replication checkpoint schema. That's essential for this analysis to work.

There's nothing fundamental that limits the analysis to first-order transitive relationships. One could build out an entire graph. I'm not sure the extra complexity that would bring is worth it in a first pass.

Additional context

Proposing this enhancement after chatting with a user who is planning this kind of deployment and would benefit from the enhancement.

nickva commented 3 years ago

I think it might be doable if we record a few more bits in the checkpoint documents and change their shape a bit.

[1] example checkpoint:

{
    "_id": "_local/d99e532c1129e9cacbf7ed085deca509",
    "_rev": "0-17",
    "history": [
        {
            "doc_write_failures": 0,
            "docs_read": 249,
            "docs_written": 249,
            "end_last_seq": "249-g1AAAACTeJzLYWBgYMpgTmHgz8tPSTV0MDQy1zMAQsMckEQiQ1L9____szKYE5tygQLsZiaGqWlpxpjKcRqRxwIkGRqA1H-oSeVgk5JMkkxNkg0xdWUBAJ5nJWc",
            "end_time": "Wed, 21 Jul 2021 17:10:06 GMT",
            "missing_checked": 253,
            "missing_found": 249,
            "recorded_seq": "249-g1AAAACTeJzLYWBgYMpgTmHgz8tPSTV0MDQy1zMAQsMckEQiQ1L9____szKYE5tygQLsZiaGqWlpxpjKcRqRxwIkGRqA1H-oSeVgk5JMkkxNkg0xdWUBAJ5nJWc",
            "session_id": "dc645ae85a7c3fe6c3ac5da8e73077ce",
            "start_last_seq": "228-g1AAAACTeJzLYWBgYMpgTmHgz8tPSTV0MDQy1zMAQsMckEQiQ1L9____szKYE0tzgQLsZiaGqWlpxpjKcRqRxwIkGRqA1H-oSflgk5JMkkxNkg0xdWUBAJgFJVI",
            "start_time": "Wed, 21 Jul 2021 17:01:21 GMT"
        },
        ...
    ],
    "replication_id_version": 4,
    "session_id": "dc645ae85a7c3fe6c3ac5da8e73077ce",
    "source_last_seq": "249-g1AAAACTeJzLYWBgYMpgTmHgz8tPSTV0MDQy1zMAQsMckEQiQ1L9____szKYE5tygQLsZiaGqWlpxpjKcRqRxwIkGRqA1H-oSeVgk5JMkkxNkg0xdWUBAJ5nJWc"
}

[2] Unique, per db-instance UUID on main

http $DB/mydb1

{
    ...
    "instance_start_time": "0",
    "sizes": {
        "external": 34,
        "views": 0
    },
    "update_seq": "00000008d5c93d5a00000000",
    "uuid": "ce0279e40045b4f7cd6cd4f60ffd3b3c"
}