Optimization opportunities for riak_cs_kv_multi_backend.erl [JIRA: RCS-319]

basho / riak_cs

Riak CS is simple, available cloud storage built on Riak.

Apache License 2.0

568 stars 94 forks source link

Based on micro-benchmark measurements.

Part 1:

MakeMsg = fun(BKey) -> {'$gen_event',{riak_vnode_req_v1,913438523331814323877303020447676887284957839360,{fsm,undefined,self()},{riak_kv_get_req_v1,BKey, 911}}} end.
spawn(fun() -> timer:sleep(1000), [VN ! MakeMsg({<<"moss.buckets">>,<<"test-bucket">>}) || _ <- lists:seq(1,20)] end), latency_histogram_tracer:start(riak_cs_kv_multi_backend,capabilities,2, 5).
--versus---
spawn(fun() -> timer:sleep(1000), [VN ! MakeMsg({<<"moss.buckets">>,<<"test-bucket">>}) || _ <- lists:seq(1,20)] end), latency_histogram_tracer:start(riak_kv_vnode,do_get,4, 5).

The riak_cs_kv_multi_backend:capabilities/2 function is about 15-17% of the total running time of vnode do_get, for a small bucket object. This should be easy'ish to make significantly faster?

For capabilities/2:

Histogram stats:     
[{min,14},
 {max,100},
 {arithmetic_mean,28.525},
 {geometric_mean,25.12890497934286},
 {harmonic_mean,23.069180164544996},
 {median,24},
 {variance,348.9224358974358},
 {standard_deviation,18.679465621302867},
 {skewness,2.6475300998559543},
 {kurtosis,7.1063519973387645},
 {percentile,[{50,24},{75,30},{90,40},{95,58},{99,100},{999,100}]},
 {histogram,[{33,31},{54,6},{74,1},{94,0},{114,2},{134,0}]},
 {n,40}]

For do_get/4:

Histogram stats:     
[{min,100},
 {max,333},
 {arithmetic_mean,162.6},
 {geometric_mean,153.3776752794918},
 {harmonic_mean,146.02555820694457},
 {median,137},
 {variance,3899.8315789473686},
 {standard_deviation,62.44863152181454},
 {skewness,1.292663122595427},
 {kurtosis,0.8671304222013156},
 {percentile,[{50,137},{75,193},{90,219},{95,290},{99,333},{999,333}]},
 {histogram,[{190,14},{270,4},{350,2},{500,0}]},
 {n,20}]

Part 2:

This is about another 15% of do_get execution time:

spawn(fun() -> timer:sleep(1000), [VN ! MakeMsg({<<"moss.buckets">>,<<"test-yja">>}) || _ <- lists:seq(1,20)] end), latency_histogram_tracer:start(riak_cs_kv_multi_backend,get_backend_bucketprops,2, 5).

For riak_cs_kv_multi_backend:get_backend_bucketprops/2

Histogram stats:     
[{min,13},
 {max,98},
 {arithmetic_mean,24.116666666666667},
 {geometric_mean,21.896596213934533},
 {harmonic_mean,20.51813663769752},
 {median,20},
 {variance,194.8844632768362},
 {standard_deviation,13.960102552518595},
 {skewness,3.242453629015644},
 {kurtosis,12.640148642592747},
 {percentile,[{50,20},{75,26},{90,34},{95,42},{99,71},{999,98}]},
 {histogram,[{25,44},{37,10},{53,4},{63,0},{73,1},{93,0},{103,1},{113,0}]},
 {n,60}]

See also the flamegraph at http://www.snookles.com/scottmp/foo.eflame2.55.svg

Nice analysis. It drives me crazy how many times we have to retrieve bucket properties - ideally we would retrieve at most once on the coordinator and once on the storage nodes. Even better would be just on the coordinator and transmitting the parameters needed for storage - but that would give up the ability to gradually migrate between backends node-by-node.

On Sat, Dec 13, 2014 at 5:54 AM, Scott Lystig Fritchie < notifications@github.com> wrote:

Based on micro-benchmark measurements.

Part 1:

MakeMsg = fun(BKey) -> {'$gen_event',{riak_vnode_req_v1,913438523331814323877303020447676887284957839360,{fsm,undefined,self()},{riak_kv_get_reqv1,BKey, 911}}} end. spawn(fun() -> timer:sleep(1000), [VN ! MakeMsg({<<"moss.buckets">>,<<"test-bucket">>}) || <- lists:seq(1,20)] end), latency_histogram_tracer:start(riak_cs_kv_multibackend,capabilities,2, 5). --versus--- spawn(fun() -> timer:sleep(1000), [VN ! MakeMsg({<<"moss.buckets">>,<<"test-bucket">>}) || <- lists:seq(1,20)] end), latency_histogram_tracer:start(riak_kv_vnode,do_get,4, 5).

The riak_cs_kv_multi_backend:capabilities/2 function is about 15-17% of the total running time of vnode do_get, for a small bucket object. This should be easy'ish to make significantly faster?

For capabilities/2: Histogram stats:

[{min,14}, {max,100}, {arithmetic_mean,28.525}, {geometric_mean,25.12890497934286}, {harmonic_mean,23.069180164544996}, {median,24}, {variance,348.9224358974358}, {standard_deviation,18.679465621302867}, {skewness,2.6475300998559543}, {kurtosis,7.1063519973387645}, {percentile,[{50,24},{75,30},{90,40},{95,58},{99,100},{999,100}]}, {histogram,[{33,31},{54,6},{74,1},{94,0},{114,2},{134,0}]}, {n,40}]

For do_get/4: Histogram stats:

[{min,100}, {max,333}, {arithmetic_mean,162.6}, {geometric_mean,153.3776752794918}, {harmonic_mean,146.02555820694457}, {median,137}, {variance,3899.8315789473686}, {standard_deviation,62.44863152181454}, {skewness,1.292663122595427}, {kurtosis,0.8671304222013156}, {percentile,[{50,137},{75,193},{90,219},{95,290},{99,333},{999,333}]}, {histogram,[{190,14},{270,4},{350,2},{500,0}]}, {n,20}]

Part 2:

This is about another 15% of do_get execution time:

spawn(fun() -> timer:sleep(1000), [VN ! MakeMsg({<<"moss.buckets">>,<<"test-yja">>}) || _ <- lists:seq(1,20)] end), latency_histogram_tracer:start(riak_cs_kv_multi_backend,get_backend_bucketprops,2, 5).

For riak_cs_kv_multi_backend:get_backend_bucketprops/2

See also the flamegraph at http://www.snookles.com/scottmp/foo.eflame2.55.svg

— Reply to this email directly or view it on GitHub https://github.com/basho/riak_cs/issues/1032.

Jon Meredith Chief Architect Basho Technologies, Inc. jmeredith@basho.com

basho / riak_cs

Optimization opportunities for riak_cs_kv_multi_backend.erl [JIRA: RCS-319] #1032