automerge / automerge-classic

A JSON-like data structure (a CRDT) that can be modified concurrently by different users, and merged again automatically.
http://automerge.org/
MIT License
14.75k stars 466 forks source link

UX: Difficult to realize why changes aren't visible #400

Open pvh opened 3 years ago

pvh commented 3 years ago

We have a class of UX problems where out-of-order or incomplete changes are "applied" but without making any visible change to the document because their dependencies have not arrived. This is surprising to users.

Automerge is designed to accept changes in any order they arrive and queue those changes until their dependencies are all present. A user can call applyChanges() repeatedly as changes come in from any source

We've seen this reported as a bug more than once now, particularly with Automerge.from. If we look at this issue, we see a user reasonably expecting that calling getChanges() on the output of Automerge.from and then apply the result to another document doesn't work. Why? It's because Automerge.from() makes an invisible change after Automerge.init() which the user did not realize was happening and the subsequent Automerge.getChanges() does not work as expected.

I'm not sure exactly what the solution is. Long term, it's probably something along the line of discouraging the use of manually calling getChanges() to synchronize two documents, but that's a bit unsatisfying. I think shorter-term, I might consider changing the default behaviour of applyChanges() to throw an error if the changes can't be applied and adding a second receiveChanges() (or something) which more explicitly manages out-of-order changes but... this is a bit unconsidered.

(As an aside, this is somewhat related to but different from a problem that occurs when a user tries to merge documents created independently on two nodes with the same structure and is confused why changes from only one system are visible.)

ept commented 3 years ago

Good observation. We did add the pendingChanges field to the patch precisely to give a bit of feedback that there was an enqueued, unapplied change. However, the field is very easy to miss if you don't know what you're looking for. Perhaps better documentation would help.

Having applyChanges throw an exception by default seems reasonable to me. Rather than having a separate function for applying out-of-order changes, we could also add an option argument to applyChanges.

pvh commented 3 years ago

I think that's a reasonable design too, yes. We can have the error message queue people to the solution. One nice thing about a separate implementation is that we could sort of concentrate the enqueued changes with their own API (getUnappliedChanges()) and the default code path wouldn't have to know anything about them.

alexjg commented 3 years ago

There's another usecase I am encountering that would benefit from throwing when there are unapplied changes. I'm building a system which merges changes from many peers whilst ensuring the document conforms to a schema. I do this by storing the hash graph of changes and then instantiating the document by walking through the graph, evaluating a change, and then checking if the document is still valid. If the document is not valid then I discard the offending change and all dependent changes.

The hash graph I am storing is actually a graph of sets of changes - the changes are just bytes from the perspective of my code. Peers publish changes and state when they publish them what the dependencies of the change are. This is vulnerable to a failure mode where a peer publishes a schema invalidating change with incorrect dependencies, applying the change may do nothing because the dependencies are not present but when the actual dependencies are applied then the document becomes invalid and I discard the wrong set of changes.

To avoid this I end up having to examine the automerge changes and check that the dependencies are what the peer has said they are. This is not ideal because I would rather be agnostic as to the actual contents of the change set. If automerge could throw when there are unapplied changes then I would not need to examine changes.

rongoro commented 3 years ago

As a counter example, in our case we actually take advantage of the fact that automerge will accept changes in any order. So it would be great to maintain that functionality even if it's not the default.

We may change the behavior in the future but this is what we do for now.

Our application requires a trusted central source of truth. The central "server" acts similar to a client in that it gets sent automerge changes and then applies them to its local copy of the automerge document. The goal was to try and do this completely serverless. In our case using firestore and cloud functions. Clients write changes to firestore which then get validated and applied by cloud functions.

Those cloud functions are run on each change and they can in theory and practice be run out of order (this can be especially true with lots of fast changes and distributed transactions). The easiest thing is to hand changes off to the primary automerge document whenever possible.

Allowing the data structure to manage the data ordering instead of requiring the infrastructure to guarantee in-order application of changes is a nice property.

jeffa5 commented 3 years ago

I've had a go at removing the queue from the Rust backend here. Pretty simple and still allows wrappers to handle the queuing logic.

One question I have is around applying changes from sync messsages, should we be sent changes with missing dependencies or should we always have the dependencies. Then depending on this we may want to filter the changes to those that can be applied in receive_sync_message as a special case.