gabrielgiussi / cappio

1 stars 0 forks source link

Implement CRDT's #18

Open gabrielgiussi opened 5 years ago

gabrielgiussi commented 5 years ago

Operation vs Value

I can implement both and the difference will appear mainly in the messages. I can't use this difference for conditions because in order for this to make sense I should enable the user to select with kind of abstraction wants to use and add some conditions that take into account the "weight" of the messages or stablish a delay based on this "weight".

I need to understand two things:

  1. in which moments I will do the prepare and effect stages.
  2. if I require another abstraction on top of CausalBroadcast (eventuate has an event log).
  3. crdts requires vector clocks or version vectors? (eventuate uses plausible vector clocks)

Read

Pure Op

I think this mainly changes the internal implementation that the user will not be able to see. So maybe it only makes sense to implement an abstraction with both pure + stable.

Pure Op + Stable

This requires the causal broadcast abstraction to expose the vector time in the CRBDeliver. Is easy to expose always and only use it in the app abstraction that does stabilization. Read

gabrielgiussi commented 4 years ago

A comprehensive study of Convergent and Commutative Replicated Data Types

CvRDT (State based crdt)

Eventual convergence requires that all replicas receive all updates. The communication channels of a CvRDT may have very weak properties. Since merge is idempotent and commutative (by the properties of ⊔v), messages may be lost, received out of order, or multiple times, as long as new state eventually reaches all replicas, either directly or indirectly via successive merges. Updates are propagated reliably even if the network partitions, as long as eventually connectivity is restored. Proposition 2.1. Any two object replicas of a CvRDT eventually converge, assuming the system transmits payload infinitely often between pairs of replicas over eventually-reliable point-to-point channels

So, reliable broadcast should be enough (or uniform?)

CmRDT (op based crdt)

Liveness requires that every update eventually reaches the causal history of every replica. To this effect, we assume an underlying system reliable broadcast that delivers every update to every replica in an order <d (called delivery order) where the downstream precondition is true. We note that all downstream preconditions in this paper are satisfied by causal delivery, i.e., delivery order is the same or weaker as causal order: f <d g ⇒ f <→ g.

If all concurrent operations commute, then all execution orders consistent with delivery order are equivalent, and all replicas converge to the same state. Such an object is called a Commutative Replicated Data Type (CmRDT). As noted earlier, for all data types studied here, causal delivery <→ (which is readily implementable in static distributed systems and does not require consensus) satisfies delivery order <d. For some data types, a weaker ordering suffices, but then more pairs of operations need to be proved commutative.

I need causal broadcast, and I could use waiting or non-waiting algorithms.

Pure op based

Reliable Causal Broadcast (RCB) is a prominent dissemination abstraction that is often used in AP systems (in the CAP [15] context) to ensure causal delivery of messages, being the strongest consistency model in always-available systems that eventually converges [22, 2]. A common implementation strategy for a reliable causal broadcast service [30] is to assign a vector clock to each message broadcast, and use the causality information in the vector clock to decide at each destination when a message can be delivered. If a message arrives at a given destination before causally preceding messages have been delivered, the service delays delivery of that message until those messages arrive and are delivered. When messages are delivered to the application layer, in a sequence that is consistent with causality, there is some loss of information when using standard RCB middleware. Messages that the middleware knows be concurrent, are delivered in order and the application is unaware of that concurrency.

TCSB (Tagged Causal Stable Broadcast)

Reliable Causal Broadcast (RCB) uses internally a logical timestamp t (e.g., a vector clock) to order messages. We propose to simply expose this information to the upper layers. Technically, the cdeliver(m) middleware API (used in Algo. 1) will now look like this: tcdeliver(t, m) meaning that, the calling process will receive the message m as well as its associated timestamp t.

Requires a causal broadcast that includes the vector clock in the delivery. This can't use the non-waiting (at least not the implementation proposed in the book because it doesn't use vector clocks)