This weekend I was implementing previousCommit in Atomic Server and Atomic Data Browser. This means that Commits need to have an explicit previousCommit in order to be accepted by a server. This prevents issues where a client is trying to create a resource, and accidentally overwrites an existing one. It also helps with discovery of Commits, and having a more explicit audit log. But as I was implementing it, I noticed that when typing really quickly (and creating a lot of Commits), I can get errors. This makes sense, because it's very hard for the client and the server to be fully in sync all the time.

Let's assume two Commits A and B that try to modify some Resource. A was created first, but B arrives first on the server, for some reason. Both will refer to some previous Commit P.

When applying Commit B, all is good. But when applying Commit A, the previousCommit now should be B, but Commit A does not know it's actually A, so it refers to the older Commit P.

In the current implementation, the client gets an Error: the previousCommit does not match the current lastCommit of the Resource.

How can we deal with this?

Auto merge (server fixes the issue)

The server creates a new Commit C, signs it #72, and applies it. This Commit contains the changes of Commit A. This new commit C also refers to the changes in A.

But merging these is not trivial. What do you do when B added a word and A also added a word? Do you combine them? What if A removed a word, and B added a word?

Auto retry (client fixes the issue)

The client learned that their commit A failed, and has received the new latest commit B. It applies this commit, and tries agiain. The client will learn of the new latestCommit and can now create a new Commit D. It has the same content as A, but the latestCommit is now updated.

This can be a destructive Commit, as the changes from Commit B have been Applied on the server, but were fully ignored by the Client.

Or the Client should take into account the changes that Commit A did, possibly to the same property. But that would make things a lot more complicated, it would involve dealing with string insertions and positions. Out of scope for now, so we can live with a little bit of data loss when merge conflicts happen - we still maintain previous versions.

State hashes, instead of (or on top of) Commit hashes

Similar to how git works, we could look at the hash of the Resources content, instead of it's latest Commit. If these are the same, we can assume that two systems (even though their history may differ) have the same current state.

Allow branching and merging

The server could accept Commit A and B. In this case, we get two branches. This introduces a bunch of questions:

If someone requests the current state, which branch is picked? Do we have some 'main' branch? And if so, how do we know which one is that?
How do we know which branches are there for a resource? Should we introduce a /branches Endpoint?
Who performs merges? Should clients do this?

Full CRDT (branching + automatically merge)

Atomic Commits are currently about subjects that are hosted at some specifc domain, by some server. In that case, we have a centralized, coordinated system. But we could also consider decentralizing this. If we'd do this, the system should be commutative, all systems would have to arrive at the same state after reading the Commits.

Links to check out, inspiration:

atomicdata-dev / atomic-data-docs

Advanced Version Control System: branching, merging, CRDT #100