CBaquero / delta-enabled-crdts

Reference implementations of state-based CRDTs that offer deltas for all mutations.
Apache License 2.0
318 stars 36 forks source link

LWWReg (as currently implemented) is not commutative. #7

Open jemc opened 7 years ago

jemc commented 7 years ago

Greetings.

I've been working on implementing these delta-state convergent replicated data types in another language (Pony). If you're interested in seeing my implementation so far, you can find it here.

While implementing, I noticed that your lwwreg::join operation is not commutative. That is, when multiple join operations are applied with the same timestamp (U), the final value for any particular replica of the register depends on the order in which the operations were applied. Specifically, for a set of operations whose timestamp is the same, and that timestamp is the highest timestamp that the replicas will see, the first of such join operations will "win".

Here's a trivial worked example, where more than one value arrives with a logical timestamp of "6":

replica A receives: (5, "foo"), (6, "bar"), (6, "baz"); final: (6, "bar")
replica B receives: (6, "bar"), (5, "foo"), (6, "baz"); final: (6, "bar")
replica C receives: (6, "baz"), (6, "bar"), (5, "foo"); final: (6, "baz")

In the example, two of the replicas end with a final value of "bar", and one replica ends with a final value of "baz", because of the order in which the delta-states was applied. Even with infinite retransmission of those delta-states, the result in each replica will not change from the shown final value.


The workaround I used to make the join operation fully commutative and salvage this data structure as a CRDT is to:

This strategy ensures that all replicas will eventually converge to the same final value and timestamp.

In essence, the "Last Writer Wins Register" (LWWReg) must become either a "Last Writer Greater Value Wins Register" (LWGVWReg) or a "Last Writer Lesser Value Wins Register".

If both the timestamp and the value are equivalent (neither side is greater or lesser in either case), then there is no conflict to resolve.


You can see my implementation of LWWReg here, including a bias type parameter named B which accepts either BiasGreater or BiasLesser as the type argument, so that the bias strategy is selected at compile time and is part of the type signature.

Just because it might help make sense of the code, I'll note that one additional change I've made from your model is that my delta-mutator methods accept a delta-state-so-far as a parameter, so that the pattern of accumulating several delta-states before sending it to the other replicas is facilitated with fewer object allocations.


In closing, thank you for your work on this highly important paper, and on this reference implementation. I hope this issue ticket can be of some help in the overall goal of increasing adoption of these concepts.

vitorenesduarte commented 7 years ago

Hi,

From A comprehensive study of Convergent and Commutative Replicated Data Types:

Timestamps are assumed unique, totally ordered, and consistent with causal order

I think this is assumed here. There's also a reference implementation in Erlang that assumes the same.

It's great to see an implementation in Pony.

CBaquero commented 7 years ago

Hi Joe, and thanks for pointing this out and suggesting a nice solution.

LWW based CRDTs are even hard to consider CRDTs since the direction of evolution is directed by an external parameter, here a timestamp. So, as Vitor said, we assume that the fine granularity of the externally supplied timestamps ensures them to be unique.

But, it is good to plan for possible ties in the timestamps. And in those cases the trick is to use something else (as you did with the value) to introduce a tie breaking bias. I didn't check the Pony code yet, but the solution description you made looks perfectly correct to me.

I think that in another datatype we also had to introduce a bias to fix LWW ties "RWLWWSet: Last-writer-wins set with remove wins bias (SoundCloud inspired)".

Thanks again for working in the Pony side, we might revise later our LWWReg to ensure convergence in case of ties, but I want to think more carefully on the impact of forcing payload to be comparable.

Best,

Carlos

jemc commented 7 years ago

Thanks for your quick responses.

I wasn't thinking of the timestamp unicity as being a premise of the model, because I was still thinking in terms of the SoundCloud-inspired LWWSet, which as you mention does not assume timestamp unicity, and uses either an add-wins or remove-wins bias to resolve the conflict.

From my perspective requiring comparability of the value is not too onerous, as even if the value is not inherently comparable, a commutative conflict-resolution function could be supplied which provides a total ordering of the possible values. Even if that total ordering is arbitrary from the perspective of the data, it still provides eventual consistency for the case of timestamp collision, and the semantics of the CRDT as far as the application is concerned would be "the result is arbitrary in the case of timestamp collision", though that arbitrary result is still eventually consistent. However, I could see how it might be an implementation practicality concern to figure out how the caller would supply these conflict-resolution functions in C++, for the cases of types that are not inherently comparable.

Anyway, I appreciate the discussion here. You can feel free to close this issue ticket if you don't see it as useful to keep open as an "action item" for work. I will proceed with my approach for an LWWReg with bias, and you can take whatever approach you find is best for your project. I might just suggest that you may want to add a bit of explicit documentation to the README about timestamp unicity for LWWReg, to avoid potential problems and/or confusion when users are not aware of this requirement.