Closed oleksiyk closed 10 years ago
Disabling riak_control=off
helped a bit, but still the load is constantly 3-20%:
===============================================================================================================================
'riak@10.0.1.1' 08:35:45
Load: cpu 0 Memory: total 47623 binary 4864
procs 1093 processes 13267 code 12629
runq 0 atom 541 ets 6226
Pid Name or Initial Func Time Reds Memory MsgQ Current Function
-------------------------------------------------------------------------------------------------------------------------------
<6315.104.0> erlang:apply/2 '-' 7170561 2600 0 cpu_sup:measurement_server_loop/1
<6315.191.0> riak_core_vnode_manager '-' 4730615 234104 0 gen_server:loop/6
<6315.93.0> riak_sysmon_filter '-' 3282618 5760 0 gen_server:loop/6
<6315.3.0> erl_prim_loader '-' 2781141 142624 0 erl_prim_loader:loop/3
<6315.194.0> riak_core_capability '-' 2139023 55016 0 gen_server:loop/6
<6315.165.0> riak_core_ring_manager '-' 1666166 230360 0 gen_server:loop/6
<6315.7.0> application_controller '-' 1626588 89544 0 gen_server:loop/6
<6315.485.0> riak_kv_stat_sj_stats '-' 1153005 8736 0 gen_server:loop/6
<6315.453.0> riak_kv_put_fsm_sj_stats '-' 1142398 5720 0 gen_server:loop/6
<6315.469.0> riak_kv_get_fsm_sj_stats '-' 1139338 5720 0 gen_server:loop/6
===============================================================================================================================
'riak@10.0.1.1' 08:35:46
Load: cpu 0 Memory: total 47668 binary 4864
procs 1093 processes 13089 code 12629
runq 0 atom 541 ets 6226
Pid Name or Initial Func Time Reds Memory MsgQ Current Function
-------------------------------------------------------------------------------------------------------------------------------
<6315.104.0> erlang:apply/2 '-' 2351 2600 0 cpu_sup:measurement_server_loop/1
<6315.93.0> riak_sysmon_filter '-' 2284 5760 0 gen_server:loop/6
<6315.7.0> application_controller '-' 662 143728 0 gen_server:loop/6
<6315.485.0> riak_kv_stat_sj_stats '-' 385 8736 0 gen_server:loop/6
<6315.453.0> riak_kv_put_fsm_sj_stats '-' 382 8736 0 gen_server:loop/6
<6315.469.0> riak_kv_get_fsm_sj_stats '-' 361 5720 0 gen_server:loop/6
<6315.210.0> riak_core_stat_cache '-' 189 55016 0 gen_server:loop/6
<6315.211.0> riak_core_stat_calc_sup '-' 153 5960 0 gen_server:loop/6
<6315.390.0> riak_api_pb_sup '-' 46 2704 0 gen_server:loop/6
<6315.94.0> timer_server '-' 44 2784 0 gen_server:loop/6
===============================================================================================================================
'riak@10.0.1.1' 08:35:48
Load: cpu 0 Memory: total 47477 binary 4864
procs 1093 processes 12897 code 12629
runq 0 atom 541 ets 6226
Pid Name or Initial Func Time Reds Memory MsgQ Current Function
-------------------------------------------------------------------------------------------------------------------------------
<6315.191.0> riak_core_vnode_manager '-' 13515 146432 0 gen_server:loop/6
<6315.194.0> riak_core_capability '-' 6106 55016 0 gen_server:loop/6
<6315.104.0> erlang:apply/2 '-' 4702 2600 0 cpu_sup:measurement_server_loop/1
<6315.7.0> application_controller '-' 871 89544 0 gen_server:loop/6
<6315.195.0> riak_core_gossip '-' 827 88504 0 gen_server:loop/6
<6315.469.0> riak_kv_get_fsm_sj_stats '-' 775 8736 0 gen_server:loop/6
<6315.210.0> riak_core_stat_cache '-' 750 55016 0 gen_server:loop/6
<6315.453.0> riak_kv_put_fsm_sj_stats '-' 740 5720 0 gen_server:loop/6
<6315.485.0> riak_kv_stat_sj_stats '-' 729 8736 0 gen_server:loop/6
<6315.211.0> riak_core_stat_calc_sup '-' 332 5960 0 gen_server:loop/6
===============================================================================================================================
'riak@10.0.1.1' 08:35:49
Load: cpu 0 Memory: total 47523 binary 4865
procs 1093 processes 12916 code 12629
runq 0 atom 541 ets 6226
Pid Name or Initial Func Time Reds Memory MsgQ Current Function
-------------------------------------------------------------------------------------------------------------------------------
<6315.104.0> erlang:apply/2 '-' 2351 2600 0 cpu_sup:measurement_server_loop/1
<6315.93.0> riak_sysmon_filter '-' 2294 5760 0 gen_server:loop/6
<6315.7.0> application_controller '-' 672 143728 0 gen_server:loop/6
<6315.485.0> riak_kv_stat_sj_stats '-' 394 8736 0 gen_server:loop/6
<6315.469.0> riak_kv_get_fsm_sj_stats '-' 363 5720 0 gen_server:loop/6
<6315.453.0> riak_kv_put_fsm_sj_stats '-' 358 8736 0 gen_server:loop/6
<6315.210.0> riak_core_stat_cache '-' 189 55016 0 gen_server:loop/6
<6315.211.0> riak_core_stat_calc_sup '-' 179 5960 0 gen_server:loop/6
<6315.94.0> timer_server '-' 63 2784 0 gen_server:loop/6
<6315.390.0> riak_api_pb_sup '-' 46 2704 0 gen_server:loop/6
===============================================================================================================================
'riak@10.0.1.1' 08:35:51
Load: cpu 0 Memory: total 47476 binary 4864
procs 1093 processes 12896 code 12629
runq 0 atom 541 ets 6226
Pid Name or Initial Func Time Reds Memory MsgQ Current Function
-------------------------------------------------------------------------------------------------------------------------------
<6315.104.0> erlang:apply/2 '-' 4702 2600 0 cpu_sup:measurement_server_loop/1
<6315.7.0> application_controller '-' 871 89544 0 gen_server:loop/6
<6315.453.0> riak_kv_put_fsm_sj_stats '-' 761 8736 0 gen_server:loop/6
<6315.485.0> riak_kv_stat_sj_stats '-' 752 5720 0 gen_server:loop/6
<6315.210.0> riak_core_stat_cache '-' 750 55016 0 gen_server:loop/6
<6315.469.0> riak_kv_get_fsm_sj_stats '-' 748 8736 0 gen_server:loop/6
<6315.211.0> riak_core_stat_calc_sup '-' 332 5960 0 gen_server:loop/6
<6315.93.0> riak_sysmon_filter '-' 285 5760 0 gen_server:loop/6
<6315.390.0> riak_api_pb_sup '-' 96 2704 0 gen_server:loop/6
<6315.103.0> cpu_sup '-' 67 2704 0 gen_server:loop/6
^C
I wonder if it is stats? Could you turn up the stat calc TTL to something really long?
I can't find the setting in cuttlefish, so I guess it'll need to go in your advanced.config
[{riak_core, [{stat_cache_ttl, $SOME_NUMBER_OF_SECONDS}]}].
Make $SOME_NUMBER_OF_SECONDS
big to make infrequent the background calculation of stats, and let me know what difference that makes, please?
Also, until we have a known issue, the mailing list[1] is the best place for this kind of thing, then the community at large benefits, there is general visibility. Assuming we then find an issue as a result, we can open it here. Please?
[1] riak users mailing list - http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Sorry about using github instead of mailing list, will use it next time!
I've added the following to advanced.config:
[
{riak_core, [
{stat_cache_ttl, 30}
]}
].
And it helped a bit, however I still see the CPU load spikes at about 20% each 5-10 seconds (not 30 as configured for stats ttl).
Is the spike frequency and size the same whatever TTL setting you use? For example, change that TTL to 120, and is the behaviour the same?
It seems it has settled by itself. The load by beam.smp now is constant 1% with spikes not over 5%
Is it safe to set stat_cache_ttl to 30 secs in production?
So that setting solves your problem? This suggests that there is an issue with stat calculation taking up too much resource.
What is a safe setting for the frequency of stat calculation depends on how often you consume them. A 30 second window might be too big for most users. Once a second is probably too frequent for most users too.
Tune it based on what is acceptable to you as a trade-off CPU usage vs. freshness of stats.
I think we should keep this issue open, though, one second frequency of stats calc shouldn't really cause constant CPU load, imo.
Yes, once I set stat_cache_ttle to 1 second the CPU load by beam.smp is back to minimum 4-7% with spikes to 20-30%
Hmm. Ok. Well, let's leave this open and hopefully we can look at it before 2.0RC.
Thanks.
Yes, and riak-admin top
shows cpu_sup:measurement_server_loop/1
at the top of each iteration
Interesting. I'll look into it…though if you figure it out and send a patch/PR, I'll buy you a beverage of your choice.
Hi Russell,
Do you have any updates on this?
thanks, Michelle
nope
@oleksiyk can you let me know what OS you are using, please?
@oleksiyk I'm not seeing this on OS X. I'm running the latest beta. Would you mind trying the current beta, and reporting back, please?
@russelldb OS is Ubuntu 14.04
I'm running riak_2.0.0beta1-1
stat_cache_ttl
is set to 1 second
CPU load by beam SMP is 4-7% (idle). I think that's totally acceptable.
@oleksiyk so I can close this issue?
Well I haven't noticed any slow down when putting real load on Riak so probably you can close this issue.
Thank you
I'm running a clean install of 5 node Riak 2.0.pre20 cluster on a physical servers with intel xeon hexacore and experiencing constant load on CPU by beam.smp (on all nodes). The load is 7-25%. Server (and Riak) is absolutely idle.
riak-admin top interval 2
:There are no errors in console.log or error.log Is that normal?
riak.conf: