basho / riak_dt

Convergent replicated datatypes in Erlang
Apache License 2.0
353 stars 70 forks source link

Deferred operations may be dropped before they're delivered #99

Open russelldb opened 10 years ago

russelldb commented 10 years ago

Deferred Ops

Distributed databases like basho/riak sometimes use sloppy quorums for greater availability. Because of this I added deferred operations to data types that have a "observed remove" semantic.

In the CRDT Catalogue tech report[1] removing an element that is not present from a set (and likewise a field from a map) results in a "precondition exception". It is a precondition of removal that the element be present.

In riak, Data types are a mixed operation/state based hybrid. The client sends operations to the database, and the database performs these operations on the state of the data type.

We provide a precondition_context or context for short when a data type is read, and this is returned with any remove operation to enforce the "observed remove" semantic.

Due to sloppy quorums (called fallbacks in Riak) it is possible that a replicas receiving an operation may lag behind the causal history observed by a client. In the extreme, a fallback my start up just to perform the operation and have no state whatsoever.

Deferred operations were added to mitigate this scenario. If a fallback receives a context remove for a set element that is not present, it stores the context, and operation, as part of the data type state. When later, the set merges with other replicas, the operation will be performed as soon as the Set's causal history is up-to-date with the deferred operations.

Maps

There is a bug with deferred operations when a Set/Flag/Map is stored in a Map. Here is the scenario. All examples are R/W=1 for simplicity, but any disjoint set of R/W will do.

The fix, I think, is either to store deferred operations at the top-level, or never remove a field a with undelivered operations (though do not show such a field in value.) Whichever is simpler/works best.

seancribbs commented 10 years ago

@russelldb Is this likely to be fixed in the 2.0 cycle? Otherwise I'll remove the known-issue label.