apache / couchdb

Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
https://couchdb.apache.org/
Apache License 2.0
6.11k stars 1.02k forks source link

Support server side / auto conflict resolution #1506

Open wohali opened 5 years ago

wohali commented 5 years ago

Migrated from the CouchDB Summit 2017 topics.

No comments or description provided. This has come up many times before and isn't trivial for many reasons....comments from @janl or @davisp as to why this hasn't been done yet might be illuminating for others who might try to implement this naively.

thigg commented 4 years ago

Would love to read some investigation on what are the difficulties with this.

wohali commented 4 years ago

@thigg for a start, imagine 2 databases replicating with each other that have different CRDT definitions. It breaks replication/eventual consistency and can lead to infinite loops/infinite document revisions.

davisp commented 4 years ago

The two main difficulties are what @wohali says in that it would be conceivably easy for an incorrect resolution strategy to inject entropy into the system creating feed back loops.

The second is that conflict resolution really depends on user data. To my knowledge there's no generally applicable "resolution algorithm" that can handle arbitrary JSON data correctly. So we're either looking at changing all of our APIs to some restricted set of operations that can be resolved which is likely not going to be universal for all CouchDB use cases, or we're looking at some sort of "custom resolution function" that leads back to the first point about getting into weird feedback loops in circular replications.

thigg commented 4 years ago

As far as I understand you, you pointed 2 problems out:

  1. If conflict resolution is not strictly deterministic we might get feedback loops. It would be better to prohibit them by design and not encourage users to break stuff.
    1. Conflict resolution is almost always business logic and therefore general solutions are of very limited use/impossible.

A solution that comes to my mind is to provide functions which are safe to use together with a fieldmatcher. So something like that:

a.b -> min
c.d -> ListMerge
a..[aa*b] -> max

That would mean:

That would provide a safe deterministic way to resolve conflicts which would allow to solve at least a couple of conflicts. The main question with that would be, if it is actually of so much use, that it is worth the effort.

An other idea which came to my mind, was if it would be possible to prevent conflict resolution running on multiple nodes. Maybe there could be a single conflict resolution master. That would also solve the problems. (This is very close to what I am implementing currently. I guess something like this is already done very often for applications which an element of centralistic design)

Both approaches however, do not guarantee that all conflicts are solved, maybe thats a problem with usability as well?

davisp commented 4 years ago

@thigg That's basically the gist of it yeah. There may be some possibility of a DSL for conflict resolution but as you point out its quite uncertain how generally applicable that would be for any given use case.

As to requiring a central coordinator for resolutions that won't work as a CouchDB feature as there's no ability to define a set of replicating nodes. The replication protocol is centered around the concept of an open set of peers that may or may not all be replicating pair wise or hub/spoke or what have you.

Yet another fun part of conflict resolution that you touch on is what happens if we're unable to resolve some sort of combination of updates. That ends up leaving us in a situation where we're looking at accepting conflicts regardless which seems counterintuitive. The only other alternative would be to reject the update but then you've just landed in a situation where two databases end up divergent after replication.

thigg commented 4 years ago

That sounds pretty much like a feature that will be dropped then? A DSL most probably not worth the effort/problems/complexity and no other good solution?

Maybe the last way for that feature would be to add a section to the docs which elaborates on design patterns which handle conflict resolution?

thigg commented 4 years ago

On the other hand, there could be a tool independent of couchdb, just listening to the changes feed using the mentioned DSL. Maybe that allows for faster prototyping etc.

People could just pull that in as a dependency, configure it and have a simple and stable conflict resolution running.