basho / riak_cs

Riak CS is simple, available cloud storage built on Riak.
http://docs.basho.com/riakcs/latest/
Apache License 2.0
568 stars 94 forks source link

Optimization opportunities for riak_cs_kv_multi_backend.erl [JIRA: RCS-319] #1032

Open slfritchie opened 9 years ago

slfritchie commented 9 years ago

Based on micro-benchmark measurements.

Part 1:

MakeMsg = fun(BKey) -> {'$gen_event',{riak_vnode_req_v1,913438523331814323877303020447676887284957839360,{fsm,undefined,self()},{riak_kv_get_req_v1,BKey, 911}}} end.
spawn(fun() -> timer:sleep(1000), [VN ! MakeMsg({<<"moss.buckets">>,<<"test-bucket">>}) || _ <- lists:seq(1,20)] end), latency_histogram_tracer:start(riak_cs_kv_multi_backend,capabilities,2, 5).
--versus---
spawn(fun() -> timer:sleep(1000), [VN ! MakeMsg({<<"moss.buckets">>,<<"test-bucket">>}) || _ <- lists:seq(1,20)] end), latency_histogram_tracer:start(riak_kv_vnode,do_get,4, 5).

The riak_cs_kv_multi_backend:capabilities/2 function is about 15-17% of the total running time of vnode do_get, for a small bucket object. This should be easy'ish to make significantly faster?

For capabilities/2:

Histogram stats:     
[{min,14},
 {max,100},
 {arithmetic_mean,28.525},
 {geometric_mean,25.12890497934286},
 {harmonic_mean,23.069180164544996},
 {median,24},
 {variance,348.9224358974358},
 {standard_deviation,18.679465621302867},
 {skewness,2.6475300998559543},
 {kurtosis,7.1063519973387645},
 {percentile,[{50,24},{75,30},{90,40},{95,58},{99,100},{999,100}]},
 {histogram,[{33,31},{54,6},{74,1},{94,0},{114,2},{134,0}]},
 {n,40}]

For do_get/4:

Histogram stats:     
[{min,100},
 {max,333},
 {arithmetic_mean,162.6},
 {geometric_mean,153.3776752794918},
 {harmonic_mean,146.02555820694457},
 {median,137},
 {variance,3899.8315789473686},
 {standard_deviation,62.44863152181454},
 {skewness,1.292663122595427},
 {kurtosis,0.8671304222013156},
 {percentile,[{50,137},{75,193},{90,219},{95,290},{99,333},{999,333}]},
 {histogram,[{190,14},{270,4},{350,2},{500,0}]},
 {n,20}]

Part 2:

This is about another 15% of do_get execution time:

spawn(fun() -> timer:sleep(1000), [VN ! MakeMsg({<<"moss.buckets">>,<<"test-yja">>}) || _ <- lists:seq(1,20)] end), latency_histogram_tracer:start(riak_cs_kv_multi_backend,get_backend_bucketprops,2, 5).

For riak_cs_kv_multi_backend:get_backend_bucketprops/2

Histogram stats:     
[{min,13},
 {max,98},
 {arithmetic_mean,24.116666666666667},
 {geometric_mean,21.896596213934533},
 {harmonic_mean,20.51813663769752},
 {median,20},
 {variance,194.8844632768362},
 {standard_deviation,13.960102552518595},
 {skewness,3.242453629015644},
 {kurtosis,12.640148642592747},
 {percentile,[{50,20},{75,26},{90,34},{95,42},{99,71},{999,98}]},
 {histogram,[{25,44},{37,10},{53,4},{63,0},{73,1},{93,0},{103,1},{113,0}]},
 {n,60}]

See also the flamegraph at http://www.snookles.com/scottmp/foo.eflame2.55.svg

jonmeredith commented 9 years ago

Nice analysis. It drives me crazy how many times we have to retrieve bucket properties - ideally we would retrieve at most once on the coordinator and once on the storage nodes. Even better would be just on the coordinator and transmitting the parameters needed for storage - but that would give up the ability to gradually migrate between backends node-by-node.

On Sat, Dec 13, 2014 at 5:54 AM, Scott Lystig Fritchie < notifications@github.com> wrote:

Based on micro-benchmark measurements.

Part 1:

MakeMsg = fun(BKey) -> {'$gen_event',{riak_vnode_req_v1,913438523331814323877303020447676887284957839360,{fsm,undefined,self()},{riak_kv_get_reqv1,BKey, 911}}} end. spawn(fun() -> timer:sleep(1000), [VN ! MakeMsg({<<"moss.buckets">>,<<"test-bucket">>}) || <- lists:seq(1,20)] end), latency_histogram_tracer:start(riak_cs_kv_multibackend,capabilities,2, 5). --versus--- spawn(fun() -> timer:sleep(1000), [VN ! MakeMsg({<<"moss.buckets">>,<<"test-bucket">>}) || <- lists:seq(1,20)] end), latency_histogram_tracer:start(riak_kv_vnode,do_get,4, 5).

The riak_cs_kv_multi_backend:capabilities/2 function is about 15-17% of the total running time of vnode do_get, for a small bucket object. This should be easy'ish to make significantly faster?

For capabilities/2: Histogram stats:

[{min,14}, {max,100}, {arithmetic_mean,28.525}, {geometric_mean,25.12890497934286}, {harmonic_mean,23.069180164544996}, {median,24}, {variance,348.9224358974358}, {standard_deviation,18.679465621302867}, {skewness,2.6475300998559543}, {kurtosis,7.1063519973387645}, {percentile,[{50,24},{75,30},{90,40},{95,58},{99,100},{999,100}]}, {histogram,[{33,31},{54,6},{74,1},{94,0},{114,2},{134,0}]}, {n,40}]

For do_get/4: Histogram stats:

[{min,100}, {max,333}, {arithmetic_mean,162.6}, {geometric_mean,153.3776752794918}, {harmonic_mean,146.02555820694457}, {median,137}, {variance,3899.8315789473686}, {standard_deviation,62.44863152181454}, {skewness,1.292663122595427}, {kurtosis,0.8671304222013156}, {percentile,[{50,137},{75,193},{90,219},{95,290},{99,333},{999,333}]}, {histogram,[{190,14},{270,4},{350,2},{500,0}]}, {n,20}]

Part 2:

This is about another 15% of do_get execution time:

spawn(fun() -> timer:sleep(1000), [VN ! MakeMsg({<<"moss.buckets">>,<<"test-yja">>}) || _ <- lists:seq(1,20)] end), latency_histogram_tracer:start(riak_cs_kv_multi_backend,get_backend_bucketprops,2, 5).

For riak_cs_kv_multi_backend:get_backend_bucketprops/2

See also the flamegraph at http://www.snookles.com/scottmp/foo.eflame2.55.svg

— Reply to this email directly or view it on GitHub https://github.com/basho/riak_cs/issues/1032.

Jon Meredith Chief Architect Basho Technologies, Inc. jmeredith@basho.com

shino commented 8 years ago

Note: multi prefix backend module is moved to riak_kv because it is sufficiently stable. This issue is worth to hold for future improvement, but improvement should be done for the module in riak kv, not in riak cs.