basho / riak_kv

Riak Key/Value Store
Apache License 2.0
650 stars 233 forks source link

Ensure hinted handoff completes as soon as possible #847

Open engelsanchez opened 10 years ago

engelsanchez commented 10 years ago

If a fallback vnode takes data writes for an unavailable vnode, when the vnode is available again hinted handoff should make that data available to the primary vnode that owns it. If hinted handoff fails, it is possible for the fallback vnode to shut down and not try again until a node restart, where vnodes with data are started and fallbacks have a chance to start hinted handoff again.

In a complex scenario, this has contributed to data loss. If deletes happen on a key, and then new writes on the same key, vector clocks for new values may dominate the vector clocks from the deleted values. This is because the counters per actor start from zero again in our current implementation. So hinted handoff resuming much later due to an eventual node restart could make old data silently replace old data.

The causality and vector clock short-coming described above is discussing in issue basho/riak_kv#679. In this issue I'd like to track progress on improving the hinted handoff part. The desire is to have a mechanism that more actively monitors fallback vnodes and ensures that hinted handoff is completed as soon as possible

/cc @tuple Feel free to add to my description above

jrwest commented 10 years ago

delayed 2c: +1 to making fallbacks more agressive w/ moving data to primary. obviously this doesn't solve all cases (e.g. long-long-long lived partition) because of the issues outlined in https://github.com/basho/riak_kv/issues/679

engelsanchez commented 10 years ago

BTW, I don't know what I was drinking when I opened this in riak_kv instead of riak_core :(