Open peterzeller opened 4 years ago
This happens on the Kaiserslautern-Paris Testcluster.
The Paris server (antidote1) is much slower:
As you can see, antidote1 is missing a lot of updates.
Looking at the processes with the longest message queues we get:
[{746908, [{registered_name,inter_dc_query}, {current_function,{erlzmq,send,3}}, {initial_call,{proc_lib,init_p,5}}, {status,running}, {message_queue_len,746908}, {links,[<35401.1069.0>]}, {dictionary,[{'$ancestors',[antidote_sup,<35401.1068.0>]}, {'$initial_call',{inter_dc_query,init,1}}]}, {trap_exit,false}, {error_handler,error_handler}, {priority,normal}, {group_leader,<35401.1067.0>}, {total_heap_size,28315631}, {heap_size,1598}, {stack_size,26}, {reductions,5226011}, {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}}, {min_bin_vheap_size,46422}, {min_heap_size,233}, {fullsweep_after,10}, {minor_gcs,4}]}, {suspending,[]}], <35401.1082.0>}, {9947, [{current_function,{gen,do_call,4}}, {initial_call,{proc_lib,init_p,5}}, {status,waiting}, {message_queue_len,9947}, {links,[<35401.968.0>]}, {dictionary,[{'$ancestors',[riak_core_vnode_sup, riak_core_sup,<35401.964.0>]}, {'$initial_call',{riak_core_vnode,init,1}}, {rand_seed,{#{bits => 58,jump => #Fun<rand.8.10897371>, next => #Fun<rand.5.10897371>,type => exrop, uniform => #Fun<rand.6.10897371>, uniform_n => #Fun<rand.7.10897371>,weak_low_bits => 1}, [278878836590975983|200032599015734561]}}]}, {trap_exit,true}, {error_handler,error_handler}, {priority,normal}, {group_leader,<35401.963.0>}, {total_heap_size,2805641}, {heap_size,318187}, {stack_size,43}, {reductions,49383247}, {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}}, {min_bin_vheap_size,46422}, {min_heap_size,233}, {fullsweep_after,10}, {minor_gcs,8}]}, {suspending,[]}], <35401.5872.0>}, {9945, [{current_function,{gen,do_call,4}}, {initial_call,{proc_lib,init_p,5}}, {status,waiting}, {message_queue_len,9945}, {links,[<35401.968.0>]}, {dictionary,[{'$ancestors',[riak_core_vnode_sup, riak_core_sup,<35401.964.0>]}, {'$initial_call',{riak_core_vnode,init,1}}, {rand_seed,{#{bits => 58,jump => #Fun<rand.8.10897371>, next => #Fun<rand.5.10897371>,type => exrop, uniform => #Fun<rand.6.10897371>, uniform_n => #Fun<rand.7.10897371>,weak_low_bits => 1}, [6104668538044925|127505248886749810]}}]}, {trap_exit,true}, {error_handler,error_handler}, {priority,normal}, {group_leader,<35401.963.0>}, {total_heap_size,3499716}, {heap_size,514838}, {stack_size,43}, {reductions,51367855}, {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}}, {min_bin_vheap_size,46422}, {min_heap_size,233}, {fullsweep_after,10}, {minor_gcs,10}]}, {suspending,[]}], <35401.5814.0>}, {9944, [{current_function,{gen,do_call,4}}, {initial_call,{proc_lib,init_p,5}}, {status,waiting}, {message_queue_len,9944}, {links,[<35401.968.0>]}, {dictionary,[{'$ancestors',[riak_core_vnode_sup, riak_core_sup,<35401.964.0>]}, {'$initial_call',{riak_core_vnode,init,1}}, {rand_seed,{#{bits => 58,jump => #Fun<rand.8.10897371>, next => #Fun<rand.5.10897371>,type => exrop, uniform => #Fun<rand.6.10897371>, uniform_n => #Fun<rand.7.10897371>,weak_low_bits => 1}, [30436575548104867|146295658476797480]}}]}, {trap_exit,true}, {error_handler,error_handler}, {priority,normal}, {group_leader,<35401.963.0>}, {total_heap_size,2805586}, {heap_size,318187}, {stack_size,43}, {reductions,52758223}, {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}}, {min_bin_vheap_size,46422}, {min_heap_size,233}, {fullsweep_after,10}, {minor_gcs,...}]}, {suspending,[]}], <35401.5806.0>}, {9939, [{current_function,{gen,do_call,4}}, {initial_call,{proc_lib,init_p,5}}, {status,waiting}, {message_queue_len,9939}, {links,[<35401.968.0>]}, {dictionary,[{'$ancestors',[riak_core_vnode_sup, riak_core_sup,<35401.964.0>]}, {'$initial_call',{riak_core_vnode,init,1}}, {rand_seed,{#{bits => 58,jump => #Fun<rand.8.10897371>, next => #Fun<rand.5.10897371>,type => exrop, uniform => #Fun<rand.6.10897371>, uniform_n => #Fun<rand.7.10897371>,weak_low_bits => 1}, [80043739005711909|41200708321364588]}}]}, {trap_exit,true}, {error_handler,error_handler}, {priority,normal}, {group_leader,<35401.963.0>}, {total_heap_size,3499716}, {heap_size,514838}, {stack_size,43}, {reductions,47573844}, {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,...}}, {min_bin_vheap_size,46422}, {min_heap_size,233}, {fullsweep_after,...}, {...}]}, {suspending,[]}], <35401.5862.0>}, {9930, [{current_function,{gen,do_call,4}}, {initial_call,{proc_lib,init_p,5}}, {status,waiting}, {message_queue_len,9930}, {links,[<35401.968.0>]}, {dictionary,[{'$ancestors',[riak_core_vnode_sup, riak_core_sup,<35401.964.0>]}, {'$initial_call',{riak_core_vnode,init,1}}, {rand_seed,{#{bits => 58,jump => #Fun<rand.8.10897371>, next => #Fun<rand.5.10897371>,type => exrop, uniform => #Fun<rand.6.10897371>, uniform_n => #Fun<rand.7.10897371>,weak_low_bits => 1}, [20635463222876748|7786351554213997]}}]}, {trap_exit,true}, {error_handler,error_handler}, {priority,normal}, {group_leader,<35401.963.0>}, {total_heap_size,2805586}, {heap_size,318187}, {stack_size,43}, {reductions,50574756}, {garbage_collection,[{max_heap_size,#{error_logger => true,...}}, {min_bin_vheap_size,46422}, {min_heap_size,...}, {...}|...]}, {suspending,[]}], <35401.5780.0>}, {9926, [{current_function,{gen,do_call,4}}, {initial_call,{proc_lib,init_p,5}}, {status,waiting}, {message_queue_len,9926}, {links,[<35401.968.0>]}, {dictionary,[{'$ancestors',[riak_core_vnode_sup, riak_core_sup,<35401.964.0>]}, {'$initial_call',{riak_core_vnode,init,1}}, {rand_seed,{#{bits => 58,jump => #Fun<rand.8.10897371>, next => #Fun<rand.5.10897371>,type => exrop, uniform => #Fun<rand.6.10897371>, uniform_n => #Fun<rand.7.10897371>,...}, [178958286728748668|110879805029760584]}}]}, {trap_exit,true}, {error_handler,error_handler}, {priority,normal}, {group_leader,<35401.963.0>}, {total_heap_size,2805586}, {heap_size,318187}, {stack_size,43}, {reductions,48737344}, {garbage_collection,[{max_heap_size,#{...}}, {min_bin_vheap_size,...}, {...}|...]}, {suspending,[]}], <35401.5904.0>}, {9925, [{current_function,{gen,do_call,4}}, {initial_call,{proc_lib,init_p,5}}, {status,waiting}, {message_queue_len,9925}, {links,[<35401.968.0>]}, {dictionary,[{'$ancestors',[riak_core_vnode_sup, riak_core_sup,<35401.964.0>]}, {'$initial_call',{riak_core_vnode,init,1}}, {rand_seed,{#{bits => 58,jump => #Fun<rand.8.10897371>, next => #Fun<rand.5.10897371>,type => exrop, uniform => #Fun<rand.6.10897371>,...}, [100487286008278496|276410797723704352]}}]}, {trap_exit,true}, {error_handler,error_handler}, {priority,normal}, {group_leader,<35401.963.0>}, {total_heap_size,2805586}, {heap_size,318187}, {stack_size,43}, {reductions,55232873}, {garbage_collection,[{max_heap_size,...},{...}|...]}, {suspending,[]}], <35401.5786.0>}, {9925, [{current_function,{gen,do_call,4}}, {initial_call,{proc_lib,init_p,5}}, {status,waiting}, {message_queue_len,9925}, {links,[<35401.968.0>]}, {dictionary,[{'$ancestors',[riak_core_vnode_sup, riak_core_sup,<35401.964.0>]}, {'$initial_call',{riak_core_vnode,init,1}}, {rand_seed,{#{bits => 58,jump => #Fun<rand.8.10897371>, next => #Fun<rand.5.10897371>,type => exrop,...}, [98775941658005378|30697611948250862]}}]}, {trap_exit,true}, {error_handler,error_handler}, {priority,normal}, {group_leader,<35401.963.0>}, {total_heap_size,2805586}, {heap_size,318187}, {stack_size,43}, {reductions,49496244}, {garbage_collection,[{...}|...]}, {suspending,[]}], <35401.5824.0>}, {9923, [{current_function,{gen,do_call,4}}, {initial_call,{proc_lib,init_p,5}}, {status,waiting}, {message_queue_len,9923}, {links,[<35401.968.0>]}, {dictionary,[{'$ancestors',[riak_core_vnode_sup, riak_core_sup,<35401.964.0>]}, {'$initial_call',{riak_core_vnode,init,1}}, {rand_seed,{#{bits => 58,jump => #Fun<rand.8.10897371>, next => #Fun<rand.5.10897371>,...}, [192068539881179442|63064207642334594]}}]}, {trap_exit,true}, {error_handler,error_handler}, {priority,normal}, {group_leader,<35401.963.0>}, {total_heap_size,3499716}, {heap_size,514838}, {stack_size,43}, {reductions,45967488}, {garbage_collection,[...]}, {suspending,...}], <35401.5888.0>}, {9921, [{current_function,{gen,do_call,4}}, {initial_call,{proc_lib,init_p,5}}, {status,waiting}, {message_queue_len,9921}, {links,[<35401.968.0>]}, {dictionary,[{'$ancestors',[riak_core_vnode_sup, riak_core_sup,<35401.964.0>]}, {'$initial_call',{riak_core_vnode,init,1}}, {rand_seed,{#{bits => 58,jump => #Fun<rand.8.10897371>,...}, [30413726223388013|...]}}]}, {trap_exit,true}, {error_handler,error_handler}, {priority,normal}, {group_leader,<35401.963.0>}, {total_heap_size,2805641}, {heap_size,318187}, {stack_size,43}, {reductions,50465305}, {garbage_collection,...}, {...}], <35401.5886.0>}, {9920, [{current_function,{gen,do_call,4}}, {initial_call,{proc_lib,init_p,5}}, {status,waiting}, {message_queue_len,9920}, {links,[<35401.968.0>]}, {dictionary,[{'$ancestors',[riak_core_vnode_sup, riak_core_sup,<35401.964.0>]}, {'$initial_call',{riak_core_vnode,init,1}}, {rand_seed,{#{bits => 58,...},[...]}}]}, {trap_exit,true}, {error_handler,error_handler}, {priority,normal}, {group_leader,<35401.963.0>}, {total_heap_size,2805586}, {heap_size,318187}, {stack_size,43}, {reductions,...}, {...}|...], <35401.5820.0>}, {9918, [{current_function,{gen,do_call,4}}, {initial_call,{proc_lib,init_p,5}}, {status,waiting}, {message_queue_len,9918}, {links,[<35401.968.0>]}, {dictionary,[{'$ancestors',[riak_core_vnode_sup, riak_core_sup,<35401.964.0>]}, {'$initial_call',{riak_core_vnode,init,...}}, {rand_seed,{#{...},...}}]}, {trap_exit,true}, {error_handler,error_handler}, {priority,normal}, {group_leader,<35401.963.0>}, {total_heap_size,3499716}, {heap_size,514838}, {stack_size,...}, {...}|...], <35401.5860.0>}, {9916, [{current_function,{gen,do_call,4}}, {initial_call,{proc_lib,init_p,5}}, {status,waiting}, {message_queue_len,9916}, {links,[<35401.968.0>]}, {dictionary,[{'$ancestors',[riak_core_vnode_sup, riak_core_sup|...]}, {'$initial_call',{riak_core_vnode,...}}, {rand_seed,{...}}]}, {trap_exit,true}, {error_handler,error_handler}, {priority,normal}, {group_leader,<35401.963.0>}, {total_heap_size,2805586}, {heap_size,...}, {...}|...], <35401.5896.0>}, {9915, [{current_function,{gen,do_call,4}}, {initial_call,{proc_lib,init_p,5}}, {status,waiting}, {message_queue_len,9915}, {links,[<35401.968.0>]}, {dictionary,[{'$ancestors',[riak_core_vnode_sup|...]}, {'$initial_call',{...}}, {rand_seed,...}]}, {trap_exit,true}, {error_handler,error_handler}, {priority,normal}, {group_leader,<35401.963.0>}, {total_heap_size,...}, {...}|...], <35401.5784.0>}, ...
Commands used:
Infos = {erlang:process_info(P), P} || P <- erlang:processes()]. I1 = lists:reverse(lists:sort([{proplists:get_value(message_queue_len, I), I, P} || {I, P} <- Infos])).
The message queue length keeps increasing and accordingly the process memory usage is increasing:
(Benchmarks run at ~18:00 and 0:00 for about 2 minutes each)
This can be reproduced locally with a docker setup by limiting the cpu of one of the data centers.
queues.zip
Related: #320
This happens on the Kaiserslautern-Paris Testcluster.
The Paris server (antidote1) is much slower:
As you can see, antidote1 is missing a lot of updates.
Looking at the processes with the longest message queues we get:
Commands used:
The message queue length keeps increasing and accordingly the process memory usage is increasing:
(Benchmarks run at ~18:00 and 0:00 for about 2 minutes each)