RBMHTechnology / eventuate

Global-scale event sourcing and event collaboration with causal consistency (This project is in maintenance mode. Only critical bugs will be fixed, but there is no more feature development.).
http://rbmhtechnology.github.io/eventuate/
Apache License 2.0
708 stars 99 forks source link

Implement an OR Shopping Cart CRDT #114

Closed ianclegg closed 8 years ago

ianclegg commented 9 years ago

Implement an OR-Map as defined in Spec 21 in the paper: A comprehensive study of Convergent and Commutative Replicated Data Types.

An OR-Map is an extension of an OR-Set, where each item in the set is a key value mapping. However, unlike a set the key value mappings must be distinct. The OR-Map in Spec 21 defines the method to handle concurrent adding and removal of key value mappings, but it must also be possible to merge values when they are modified concurrently. In the case of Spec 21, an integer is chosen as the value type and two concurrent add's are merged by summing the integer of each add - so the value type is a monotonic counter. Since the values of an OR-Map must CRDTs themselves, it is necessary to implement a simple register to hold values first. As it happens, we already have an Multi Value Register (MVRegister) and hopefully a Last Write Wins Register (LWWRegister).

Some refactoring may be necessary to enable CmRDTs to be nested within each other, such that two concurrently modified registers can be merged. An additional discussion on CRDTs is presented by Marc Shapiro in his presentation at Microsoft Research

krasserm commented 9 years ago

From my understanding of section 5.1 and the associated specification 21, I don't think we need to nest CRDTs to implement an observed-remove shopping cart. The implementation could be almost identical to that of OR-Set except that obtaining the current value would additionally sum the quantities of a given key from (internally) distinct entries (returning a Map[A, Int]) instead of just projecting entries with the same value to the same entry in a resulting Set[A].

Instead of using Versioned[A] (as in ORSet), representing a pair (value, vector-timestamp), we should use another data structure, representing a triplet (value, count, vector-timestamp) for the OR shopping cart.

WDYT?

krasserm commented 9 years ago

Updated previous comment.

ianclegg commented 8 years ago

I see you read section 5.1 carefully. Of course your right, but I think a proper Map - where the value is not limited to a monotonic counter would be much more useful in the real world. The OR-Map I described, where the values are themselves CRDTs is functionally a superset of the OR Shopping Cart. There are implementations of this already, I just googled up a CRDT map which nests many other CRDTs - this probably isn't a great example because it does not adhere to the scala/map interface - but you get the idea https://github.com/riaken/riaken-core/blob/master/crdt.go

If you consider the OR-Map as a CRDT container, then it enables the composition of the maps on Slide 34 under CRDT approximations

Marc Shapiro Eventual-Consistency Slides

ianclegg commented 8 years ago

In terms of operations it would be nice to have a put instead of add and an additional get operation. The get operation would return the value of a given key, this is useful in cases where the map is large and the callee does not need a full copy.

ianclegg commented 8 years ago

Having read more papers about CRDTs, there are a number of different ways to implement a Map. However, I have not found a formal definition of a Map that can store expressed as operation based. The Riak team discussed their implementation. https://github.com/basho/riak/issues/354

What do you think?

krasserm commented 8 years ago

The OR-Map I described, where the values are themselves CRDTs is functionally a superset of the OR Shopping Cart

I'm not sure if this is really the case. In the OR shopping cart specification, you internally maintain multiple entries for the same key (having different vector timestamps i.e. unique identifiers) in order to be able to determine what you have observed.

Replacing this with an implementation where the value itself is a CRDT and just updating the value isn't equivalent in my opinion. You'd still need to maintain multiple entries for the same key to track what you have observed. In this case, you'd need to be able to merge values when the current value is requested. But merge is defined in the domain of state-based CRDTs.

I didn't go into the details of the references you provided but I'd like to see a specification of an operation-based OR Map and discuss it before implementing it. Would you like to work an such a specification? In the meantime, I think an OR shopping cart would still be a great addition (actually, I'd like to use it for the example application). Shall we rename the ticket to OR Shopping Cart and work on OR Map in a separate ticket?

ianclegg commented 8 years ago

Interesting points...though I did not envisage a merge when the value is requested. When I used the term merge I was referring to the operations that are performed 'at source' and downstream based on what was emitted at the source. I too would like to see an operation based specification of an OR Map. I have read many papers now, and the only specification and implementations are state-based. I had envisaged translating the state based operations into operation based 'at source' and 'downstream'. Each of the specifications has advantages and disadvantages. Unfortunately I am unlikely to have the time to work on an op-based spec - but I have reached out to Carlos Baquero at Minho university to ask if he is aware of any.

So lets rename this work item OR Shopping Cart, and I'll implement it as an educational exercise?

krasserm commented 8 years ago

So lets rename this work item OR Shopping Cart, and I'll implement it as an educational exercise?

Sounds good! I'll try to work an an op-based spec for OR Map if Carlos cannot give any pointers ... and time permits :smiley:

ianclegg commented 8 years ago

Thanks for contacting. I don’t think there is a lot done in operation-based maps. We are expanding the "Making Operation-based CRDTs Operation-based” in the next months and will probably be looking again at the map designs in that context.

But for state-based maps, you can look at the implementation of the Riak map, and also at a recent one I did on my C++ reference implementations for CRDTs with deltas. Both maps are OR maps, where updating on a key's value wins over removals of the key. You can look at my code at https://github.com/CBaquero/delta-enabled-crdts that includes a short description of the map and I am glad to try to help on understanding that design. Best, Carlos

So it may be worth waiting a few months and see where the research goes, or there is an opportunity to do develop a pure Op-Based OR-Map now. There is also an option to implement a state based delta merge Map. Carlos' implementation is clean and concise. I imagine an implementation in eventuate would generate a delta with every change and replicate the delta, downstream actors would merge it. Efficient State-based CRDTs by Delta-Mutation I'll be on vacation for a few days and give it some further thought too

gosubpl commented 7 years ago

@krasserm @ianclegg I have found this ticket by accident. I know that it is closed, but hope that you get this update. Akka has now finished adding delta-CRDTs. They should go out in 2.5.0. You can take a look at closed PRs to see the code. Please note that basket based on ORSet of ORSets (ORMultiMap) has an anomaly for remove item if the internal ORSet is not cleaned-up before the removal of the key. This might not be very visible with full state updates but bites when deltas are in play. I will post example shopping cart based on Akka's ORMultiMap on my repo soon.

krasserm commented 7 years ago

@gosubpl Thanks for letting us know and great to see that Akka is now starting to support delta-CRDTs. Will give a closer look as soon as I can.