basho / riak_pipe

Riak Pipelines
Apache License 2.0
162 stars 60 forks source link

{'EXIT', timeout} during the upgrade of a node from 1.2.1 to 1.3.0pre3 [JIRA: RIAK-2401] #67

Closed joedevivo closed 8 years ago

joedevivo commented 11 years ago

During the rewrite of riak_test's loaded_upgrade test, I ran into some timeouts from pipe.

I started with a 4 node devrel cluster of 1.2.1 nodes, and ran some map/reduce load which occasionally timed out. @beerriot said this is ok, and I added a catch for these timeouts in my load generator. These timeouts came back from riakc_pb_socket:mapred/3 as {error, {timeout, _}} and that was great.

After taking down the dev1 node, some other timeouts started rolling in. When that node was taken down, all processes applying load to that node were also killed. The timeouts looked like this:

<<"{\"phase\":0,\"error\":\"{badmatch,{'EXIT',timeout}}\",\"input\":\"{ok,{r_object,<<\\\"bryanitbs\\\">>,<<\\\"6006\\\">>,[{r_content,{dict,3,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[[<<\\\"X-Riak-VTag\\\">>,52,108,107,74,87,107,66,115,86,119,81,102,56,111,97,78,113,118,118,80,70,99]],[[<<\\\"index\\\">>]],[],[[<<\\\"X-Riak-Last-Modified\\\">>|{1359,651147,565553}]],[],[]}}},<<\\\"6006\\\">>}],[{<<197,82,177,11,81,10,161,13>>,{1,63526870347}}],{dict,1,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],...}}},...},...}\",\"type\":\"error\",\"stack\":\"[{riak_core_vnode_proxy,call,2,[{file,\\\"src/riak_core_vnode_proxy.erl\\\"},{line,52}]},{riak_pipe_vnode,queue_work_send,4,[{file,\\\"src/riak_pipe_vnode.erl\\\"},{line,331}]},{riak_pipe_vnode,queue_work_erracc,6,[{file,\\\"src/riak_pipe_vnode.erl\\\"},{line,279}]},{riak_kv_mrc_map,send_results,2,[{file,\\\"src/riak_kv_mrc_map.erl\\\"},{line,232}]},{riak_pipe_vnode_worker,process_input,3,[{file,\\\"src/riak_pipe_vnode_worker.erl\\\"},{line,445}]},{riak_pipe_vnode_worker,wait_for_input,2,[{file,\\\"src/riak_pipe_vnode_work...\\\"},...]},...]\"}">>

Also, increasing the number of pipe workers masks the problem, but I don't think that fixes it, just that I can't apply enough load to see it.

bashopatricia commented 8 years ago

Will not fix. Old version. We will re-open if it occurs on 2.X version.