basho / riak_dt

Convergent replicated datatypes in Erlang
Apache License 2.0
353 stars 70 forks source link

sets->to->ordsets [JIRA: RIAK-2385] #117

Closed zeeshanlakhani closed 8 years ago

zeeshanlakhani commented 8 years ago

…cross datatypes (orswot, od_flag, map) for small perf gain.

Benchmark posts added to thread. Driver used is in PR https://github.com/basho/basho_bench/tree/new_driver/zl/riak_dt-driver-start-with-sets.

Please review @russelldb!

russelldb commented 8 years ago

Tests all pass. Cover shows the changed lines are run many times (thanks quickcheck!) All looks good to me. Once some info on perf improvement is added I'll Plus One. Thanks!

kmarekspartz commented 8 years ago

If I get a chance, I'll re-run these against this branch: http://kyle.marek-spartz.org/posts/2014-12-01-benchmarking-large-riak-data-types.html

zeeshanlakhani commented 8 years ago

Graphs/discussion y'all (again, pretty minor improvements, but improvements nonetheless)... more tracing after the rest of the benching needs are done and 2.2 is out:

sets~single~insert~10@500x1000e~50c~3m~bitcask

For sets themselves, things mostly stayed the same, but single-set comparisons (no distribution of key PUTs) showcased improved latencies:

w/o change

sets single insert 10 500x1000e 50c 3m bitcask

w/change

sets single insert 10 500x1000e 50c 3m bitcask

sets~insert~500x1000ex10sets~55c~20m~leveldb

And we see something similar w/ a larger, .5 megabyte object:

w/o change

sets insert 500x1000ex10sets 55c 20m leveldb

w/ change

sets insert 500x1000ex10sets 55c 20m leveldb

maps~counter^insert@20~register^modify@2~read@1~flag^modify@1~4@1000bx1000e~10keys~60c~15m~bitcask

For the map, we see latencies average a lower-rate w/ complex maps (containing multiple types):

w/o change

baseline_maps counter insert 20 register modify 2 read 1 flag modify 1 4 1000bx1000e 10keys 60c 15m bitcask

w/ change

_maps counter insert 20 register modify 2 read 1 flag modify 1 4 1000bx1000e 10keys 60c 15m bitcask

maps~set^modify@10~set^remove@1~multiops^insert@5~4@100b~1000e~10000keys~75c~15m~leveldb

And a wee-bit better throughput on average:

w/o change

baseline_maps set modify 10 set remove 1 multiops insert 5 4 100b 1000e 10000keys 75c 15m leveldb

w/ change

_maps set modify 10 set remove 1 multiops insert 5 4 100b 1000e 10000keys 75c 15m leveldb

maps~single~counter^insert@1~set^modify@3~read@5~set^remove@1~multiops^insert@1~10000seqint~25c~8m~bitcask

And, as w/ single-sets above, we can notice that even though we still increase in linear time (wrt latencies), the overall slope of the ordsets runs are better overall, especially w/ multi-op maps:

w/o change

baseline_maps single counter insert 1 set modify 3 read 5 set remove 1 multiops insert 1 10000seqint 25c 8m bitcask

w/ change

_maps single counter insert 1 set modify 3 read 5 set remove 1 multiops insert 1 10000seqint 25c 8m bitcask

zeeshanlakhani commented 8 years ago

@zeckalpha thanks! I just added up some of the bench graphs. Yeah, it's pretty minor, but we also don't see worse perf w/ maps, as I was worried about this check. Also, the b_b driver used is here: https://github.com/basho/basho_bench/tree/new_driver/zl/riak_dt-driver-start-with-sets... over protocol buffers (and the erlang-client).

kmarekspartz commented 8 years ago

Neat!

russelldb commented 8 years ago

+1 37a3bb335391efa50e26305458024d3d8421e1ab

Nice! very thorough. Thanks @zeeshanlakhani

zeeshanlakhani commented 8 years ago

@borshop merge