automerge / automerge-classic

A JSON-like data structure (a CRDT) that can be modified concurrently by different users, and merged again automatically.
http://automerge.org/
MIT License
14.75k stars 467 forks source link

Lost changes when conflict in nested map #485

Closed gudn closed 2 years ago

gudn commented 2 years ago

I've researched Automerge and got a not-funny bug:

let doc = Automerge.init()
doc = Automerge.change(doc, '', doc => {
  doc.data = {
    a: 1,
    b: 2,
  }
})

// Emulate save/load from another replica
let doc2 = Automerge.change(Automerge.clone(doc), '', doc => {
  doc.data = {
    a: 1,
    b: 4,
  }
})

let doc1 = Automerge.change(doc, '', doc => {
  doc.data.a = 3
})

console.log(doc1, doc2) // { data: { a: 3, b: 2 } } { data: { a: 1, b: 4 } }
doc1 = Automerge.merge(doc1, doc2)
doc2 = Automerge.merge(doc2, doc1)
console.log(doc1, doc2) // { data: { a: 1, b: 4 } } { data: { a: 1, b: 4 } }
console.log(Automerge.getConflicts(doc1, 'data')) // undefined
console.log(Automerge.getConflicts(doc2, 'data')) // undefined
console.log(Automerge.getConflicts(doc1, 'data.a')) // undefined
console.log(Automerge.getConflicts(doc2, 'data.a')) // undefined
console.log(Automerge.getConflicts(doc1, 'data.b')) // undefined
console.log(Automerge.getConflicts(doc2, 'data.b')) // undefined

Changes from the first replica were lost. I've tried to shake orders, cloning, and others variants of use: in all cases the effect is the same.

Version of Automerge: 1.0.1-preview.7

ept commented 2 years ago

Hi @gudn, this is the expected behaviour, although I will agree that it's perhaps surprising behaviour.

When you set doc.data = {a:1, b: 4}, this has the effect of replacing the entire data object with a new object. The concurrent update of doc.data.a = 3 goes to the old object. In the merged state, doc.data points to the new object, and so the update to the old object is no longer visible. If you perform a fine-grained update doc.data.b = 4 instead of replacing the entire data object, you'll get the merging behaviour you expect.

We have considered doing things differently: for example, you could imagine changing the logic so that doc.data = {a:1, b: 4} merges in the update into the existing data object, rather than replacing it. However, taking that approach also leads to weird behaviour in certain edge-cases, and so I don't think it's preferable to the current behaviour.

As a general rule, with Automerge you need to perform updates at the individual property level if possible, i.e. doc.data.b = 4 instead of replacing the whole doc.data object.

gudn commented 2 years ago

So... I want to use this "surprising" behavior for some consistency in the document. For example, I want to have an object with a = b + 1 and b: int (any linked fields, for example, main key in database). So, I want to give guarantees about links between two fields.

If I write doc.key.a = doc.key.b + 1, another actor may modify b concurrently and my document was broken. So I'll replace the entire object: doc.data = {a: doc.data.b + 1, ...data.key}. But changes from another replica'll be lost.

I can't trust to code in another actors, so I don't know another way to achieve consistency (except selecting random winner from Automerge.Table which is not good choice).

scotttrinh commented 2 years ago

In your example, within the scope of the change callback, doc.data.b is not shared concurrently accessed mutable state. It's the state of the value in your current document, so it's perfectly safe (if you trust your copy of doc at least) to depend on any fields within your document within the change callback. If someone else also updated doc.data.b in a separate document Automerge will ensure that when merged, both actors will see the same state, and any conflicts in value there can be addressed.

As an aside, if you're trying to coordinate an incrementing counter, Automerge has a Counter CRDT that helps coordinate merges in a specific way for that specific case.

gudn commented 2 years ago

More complex example: I have a map of persons (name, wife/husband/null, age) and list of married pairs (name/name) (for fast iteration, as example). I imagine one replica wants to marry A and B when another replica wants to marry A and C.

More generally, can I achieve solving of constraint between some fields for one of replica (good state for replica A or good state for replica B) without any data lost?

And second question: is replacing value always winner than changing nested field?

scotttrinh commented 2 years ago

More complex example: I have a map of persons (name, wife/husband/null, age) and list of married pairs (name/name) (for fast iteration, as example). I imagine one replica wants to marry A and B when another replica wants to marry A and C.

More generally, can I achieve solving of constraint between some fields for one of replica (good state for replica A or good state for replica B)?

This depends on how you want to moderate conflicting changes between "replicas" here. Automerge guarantees that everyone will converge on the same state, but doesn't answer questions like this for you. One option is that your application might present these business-logic conflicts to the user to resolve. Another option is to have an authoratative replica that consistently picks a winner based on some business-logic or heuristics. I think in your example all Automerge will guarantee you is that if both "replicas" see all of the changes, they will have a consistent state between the replicas, but it can't ensure there are no application-level constraint violations like the list of married pairs cannot have duplicate members, etc.

I think for your case, you could hold a reference to your current state when applying changes, apply the changes, and validate any business rules. If there are constraint violations of your business rules, you can chose how to present the conflict to the user, or do something like detect the intent of the change and making a new change with the data integrity restored (similar to a git merge conflict, if that makes sense).

And second question: is replacing value always winner than changing nested field?

Yeah, if you assign a value to a property, it will replace that value, so if you assign it "higher" in an object's hierarchy (i.e. closer to the root), it will replace all of the subtree with that new value. That's the side effect you were seeing and that @ept was explaining. To avoid that (it's possible you want that behavior in some cases!) make small atomic changes to each property rather than using the more typical approach of spreading objects, etc. If you're familiar with immer it's a similar concept.

gudn commented 2 years ago

Thanks, this rule transforms my UB to specified behavior. I guess this part of semantic should be documented somewhere.

ept commented 2 years ago

Agree with what @scotttrinh said. Just to add one thing: when modelling your app's data as a CRDT, it's a good idea to not have any redundant information. In your example, the fact that a certain pair of people is married appears in several places: one partner has a reference to their spouse, there's a separate reference in the opposite direction, and then you also have the array of married pairs. When the information appears in multiple places, this opens up the risk that when several users update the information, these multiple representations of the same information become inconsistent with each other. That is not an Automerge-specific problem, but it applies with any mergeable data structures.

To minimise this problem, I would suggest keeping the information inside the Automerge doc as normalised (in the databases sense) as possible, and if you need denormalised information or indexes for faster access, you maintain those as separate data structures outside of the Automerge doc.