basho / yokozuna

Riak + Solr
245 stars 76 forks source link

Increase ibrowse Inactivity Timeout #367

Closed rzezeski closed 10 years ago

rzezeski commented 10 years ago

After digging into issues #320, #330, and #358 it was discovered that ibrowse's default inactivity_timeout combined with its poor load balancing algorithm and the default pool and pipeline sizes is causing unnecessary socket churn. Even under load the client may not work Yokozuna enough to fill all 10 pipelines to prevent the connection from reaching time out. Increasing the pool or pipeline size just makes things worse as the ibrowse algorithm first wants to fill the pool before using pipelines causing it to make a new connection for almost every request. My comment on #330 goes into a bit more detail and includes evidence à la DTrace [1].

Action Items

rzezeski commented 10 years ago

Pinging @wbrown, let's focus all inactivity timeout research to this issue now.

wbrown commented 10 years ago

So, a few of observations that I have now that I've filled up my database and pushed my nodes to the limit.

This is excaberated by the multithreaded search result paging code that I have. The basic flow goes:

Search performance wise, I get:

So, in this case, the ibrowse load balancing algorithm bites us on both ends -- we don't want more connections in the pool, but when all connections in the pool are busy, it severely impacts actual retrieval of the data via searches.

wbrown commented 10 years ago

Full Dataset from a Standing Start Some extra data points -- unloaded, and after the machine has been idle a while, I see:

Searches against the same dataset, different value in one field:

Cluster Status

wbrown@stratus:/internal/riak/altostratus$ du -skhc ./*
27G ./anti_entropy
135G    ./bitcask
689K    ./cluster_meta
337K    ./kv_vnode
17K ./riak_kv_exchange_fsm
114K    ./ring
25G ./yz
15G ./yz_anti_entropy
201G    total

Current Performance of Data Import I just kicked off another data import job, and wow, my system is pretty hammered, but still performant considering:

node_puts_total : 57504552
node_put_fsm_time_mean : 85170
node_put_fsm_time_median : 1705
node_put_fsm_time_95 : 19494
node_put_fsm_time_99 : 4048624
node_put_fsm_time_100 : 5971479

Some interesting error messages:

2014-04-22 04:31:12.126 [info] <0.98.0>@riak_core_sysmon_handler:handle_event:92 monitor large_heap <0.1823.0> [{initial_call,{yz_index_hashtree,init,1}},{almost_current_function,{yz_index_hashtree,do_insert,5}},{message_queue_len,301184}] [{old_heap_block_size,0},{heap_block_size,22177879},{mbuf_size,0},{stack_size,14},{old_heap_size,0},{heap_size,14484411}]
2014-04-22 07:34:14.843 [info] <0.98.0>@riak_core_sysmon_handler:handle_event:92 monitor large_heap <0.1827.0> [{initial_call,{yz_index_hashtree,init,1}},{almost_current_function,{hashtree,should_insert,3}},{message_queue_len,1355002}] [{old_heap_block_size,0},{heap_block_size,95360811},{mbuf_size,0},{stack_size,19},{old_heap_size,0},{heap_size,65138871}]

Failure to index due to timeouts:

2014-04-22 04:32:56.469 [error] <0.733.0>@yz_kv:index:205 failed to index object {{<<"obs">>,<<"obs">>},<<"AnxNK4SL8CVtGiK5/1qEtrFibWC8gWI/mSl6I2ozbss=:3">>} with error {"Failed to index docs",{error,req_timedout}} because [{yz_solr,index,3,[{file,"src/yz_solr.erl"},{line,175}]},{yz_kv,index,7,[{file,"src/yz_kv.erl"},{line,252}]},{yz_kv,index,3,[{file,"src/yz_kv.erl"},{line,192}]},{riak_kv_vnode,actual_put,6,[{file,"src/riak_kv_vnode.erl"},{line,1440}]},{riak_kv_vnode,perform_put,3,[{file,"src/riak_kv_vnode.erl"},{line,1428}]},{riak_kv_vnode,do_put,7,[{file,"src/riak_kv_vnode.erl"},{line,1223}]},{riak_kv_vnode,handle_command,3,[{file,"src/riak_kv_vnode.erl"},{line,468}]},{riak_core_vnode,vnode_command,3,[{file,"src/riak_core_vnode.erl"},{line,304}]}]

Error in entropy:

2014-04-22 04:32:56.469 [error] emulator Error in process <0.2580.1755> on node 'riak@cumulus.fabric' with exit value: {function_clause,[{yz_entropy,iterate_entropy_data,[<<3 bytes>>,[{continuation,<<91 bytes>>},{limit,100},{partition,14}],#Fun<yz_index_hashtree.5.12742521>,{error,{error,req_timedout}}],[{file,"src/yz_entropy.erl"},{line,44}]},{yz_index_hashtree,'-fold_keys/2-lc$^0/1-0-',3,[{file...

New Test I will probably redo this multi-day dataset reload, with the following changes to see if it improves things:

HugePages_Total:    3072
HugePages_Free:       80
HugePages_Rsvd:       68
HugePages_Surp:        0

Although the current allocation is because of memory pressure:

2014-04-23 01:56:49.304 [info] <0.515.0>@yz_solr_proc:handle_info:135 solr stdout/err: OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000007a7000000, 35651584, 2097152, 0) failed; error='Cannot allocate memory' (errno=12); Cannot allocate large pages, falling back to regular pages

Anything you'd like to see changed or tested on my next go-round at this?

rzezeski commented 10 years ago

@wbrown Sorry for late comments. Trying to keep pace with you :).

Some interesting error messages:

2014-04-22 04:31:12.126 [info] <0.98.0>@riak_core_sysmon_handler:handle_event:92 monitor large_heap <0.1823.0> [{initial_call,{yz_index_hashtree,init,1}},{almost_current_function,{yz_index_hashtree,do_insert,5}},{message_queue_len,301184}] [{old_heap_block_size,0},{heap_block_size,22177879},{mbuf_size,0},{stack_size,14},{old_heap_size,0},{heap_size,14484411}] 2014-04-22 07:34:14.843 [info] <0.98.0>@riak_core_sysmon_handler:handle_event:92 monitor large_heap <0.1827.0> [{initial_call,{yz_index_hashtree,init,1}},{almost_current_function,{hashtree,should_insert,3}},{message_queue_len,1355002}] [{old_heap_block_size,0},{heap_block_size,95360811},{mbuf_size,0},{stack_size,19},{old_heap_size,0},{heap_size,65138871}]

Wow, those message queues are much too large. There should be a throttle mechanism in the AAE code to prevent this overload but perhaps it is not working properly. I also remember you saying in another ticket that you increased your overload threshold for the cluster because of infiniband. I think this is a bad idea. You want to avoid overly large message queues in Erlang. Increasing the overload threshold will just allow more queues to get larger.

Failure to index due to timeouts:

2014-04-22 04:32:56.469 [error] <0.733.0>@yz_kv:index:205 failed to index object {{<<"obs">>,<<"obs">>},<<"AnxNK4SL8CVtGiK5/1qEtrFibWC8gWI/mSl6I2ozbss=:3">>} with error {"Failed to index docs",{error,req_timedout}} because [{yz_solr,index,3,[{file,"src/yz_solr.erl"},{line,175}]},{yz_kv,index,7,[{file,"src/yz_kv.erl"},{line,252}]},{yz_kv,index,3,[{file,"src/yz_kv.erl"},{line,192}]},{riak_kv_vnode,actual_put,6,[{file,"src/riak_kv_vnode.erl"},{line,1440}]},{riak_kv_vnode,perform_put,3,[{file,"src/riak_kv_vnode.erl"},{line,1428}]},{riak_kv_vnode,do_put,7,[{file,"src/riak_kv_vnode.erl"},{line,1223}]},{riak_kv_vnode,handle_command,3,[{file,"src/riak_kv_vnode.erl"},{line,468}]},{riak_core_vnode,vnode_com

This simply indicates that Solr isn't keeping up with the load.

Error in entropy:

2014-04-22 04:32:56.469 [error] emulator Error in process <0.2580.1755> on node 'riak@cumulus.fabric' with exit value: {function_clause,[{yz_entropy,iterate_entropy_data,[<<3 bytes>>,[{continuation,<<91 bytes>>},{limit,100},{partition,14}],#Fun,{error,{error,req_timedout}}],[{file,"src/yz_entropy.erl"},{line,44}]},{yz_index_hashtree,'-fold_keys/2-lc$^0/1-0-',3,[{file...

This is #324, it is an acceptable error as AAE will just retry later.

rzezeski commented 10 years ago

I'm closing this issue since it was about the inactivity timeout which has been raised for 2.0.0 until a better solution can be done in 2.x.x.