basho / riak_dt

Convergent replicated datatypes in Erlang
Apache License 2.0
353 stars 70 forks source link

Optimised PN-Counter Format (aka PNCounters V2) #47

Closed lenary closed 11 years ago

lenary commented 11 years ago

The first version of the PNCounter code literally used a pair of GCounters as its representation and deferred a lot of the behaviour down to GCounter functions.

However, there were issues with this, essentially to do with keeping the two lists of actors in sync (something needed for GC) and the fact that you store each actor twice, which is inefficient.

The new format is very similar to the structure of a GCounter, but the actor-count tuples include a count of both increments and decrements. For example:

Old Style: {[{a,4}],[{a,1},{b,2}]}
New Style: [{a,4,1},{b,0,2}]

I've updated from_binary/1 to convert into the new structure every time, however there's also a new to_binary/2 which can create the old style (should it be required for downgrades).

russelldb commented 11 years ago

Apart from the redundant size field in the v2 binary, this looks good to me. Have you micro benchmarked for size vs. speed against the original implementation?

lenary commented 11 years ago

I have not yet microbenchmarked for speed. I have, however, microbenchmarked for size using this code: https://gist.github.com/lenary/e3f583045d3955cbb846

Results (percentage = v2 is x% the size of v1, Total operations: 10000)

   2 Actors: 54.6%
   3 Actors: 56.7%
   4 Actors: 57.9%
   5 Actors: 58.7%
  10 Actors: 60.5%
 100 Actors: 62.3%
1000 Actors: 57.1%

Results (percentage = v2 is x% the size of v1, Total operations: 100000)

   3 Actors: 56.8%
   5 Actors: 58.8%
  10 Actors: 60.5%
1000 Actors: 62.5%

This doesn't account for any more than the riak_dt_pncounter:to_binary/2 function (ie metadata and riak object stuff is just meh)

@seancribbs has mentioned capabilities etc, which will be a problem, so I'll do a bit of work on it this week, but will cope with most of those issues later

lenary commented 11 years ago

So, as per guidance from @seancribbs, i have coded every conversion function, so we have a to_binary/2, from_binary/2, change_versions/3 and current_version/1. Next up is making a conversion script, but with all those functions this should be simple.

I also switched to using the constants instead of atoms like v1 and v2, so there's less crazy.

seancribbs commented 11 years ago

This looks good, @lenary. Merging!