basho-labs / little_riak_book

A Little Riak Book
Other
159 stars 46 forks source link

Vector Clock don't need client ids #8

Closed ricardobcl closed 11 years ago

ricardobcl commented 11 years ago

When I was translating little riak book to portuguese, I came across the section "vector clocks" in chapter 3. You say "Every client needs its own id, sent on every write as X-Riak-ClientId." and provide examples using client ids, but in fact since riak 1.0, this is deprecated and clients don't need to provide ids, since they are not used by vector clocks anymore (http://docs.basho.com/riak/latest/references/appendices/concepts/Vector-Clocks/#VNode-Vector-Clocks).

coderoshi commented 11 years ago

You're right that I should not have implied that a client-id is required, since it never has been. It's just a very, very, very good idea. It's incorrect to think that "clients don't need to provide ids, since they are not used by vector clocks anymore." They should still provide ids, because that's exactly how vclocks work. The change you alluded to is simply the edge case where no client id is provided. Instead of a random id being generated, the coordinating node has its own fixed id that it uses to increment the clock. This helps reduce vclock size.

For example, if I did 3 writes where a random id was generated, my vclock would look like this:

[{random1,1},{random2,1},{random2,1}]

If it uses a primary node id, your clock could be much smaller after 3 writes:

[{pnode1,3}]
ricardobcl commented 11 years ago

The thing is, if you have 1000 clients writing to a single key (each with a unique client id), their vclock would have 1000 entries. In practice, Riak uses a soft and a hard limit for vclock growth (between 20 and 50), and starts pruning older entries. This is not "safe". That's why since riak 1.0 they changed it: https://github.com/basho/riak_kv/blob/master/src/riak.erl#L106

It's better to use server ids, since you're not going to have that many primary replicas coordinating a single key, so the vclock growth is somewhat bounded.

coderoshi commented 11 years ago

Wow. So much of the Riak documentation is wrong or misleading on the subject. That's good to know... and must be fixed. Thanks for noticing, and I'll make that change to the book as well.

ricardobcl commented 11 years ago

This stuff may not be as trivial as people may think. I'm really familiar with this subject, since my master thesis (plug: http://gsd.di.uminho.pt/members/vff/dotted-version-vectors-2012.pdf) was on this subject and I'm currently finishing an alternative to vclocks in riak :)

Anyway, maybe I'll take a look at the docs and contributed when I can!

coderoshi commented 11 years ago

Removed outdated comments