Open hcs42 opened 9 years ago
Confirmed from our side and @joecaswell also had some comments:
The inactive message only gets sent to the vnode manager when the vnode FSM receives the atom timeout while in the active state.
The active state accepts and ignores any message it does not expect:
This means that if the vnode receives any message (of any type from any sender) at least once per minute, the FSM timeout will never occur so the inactive message will never get sent.
I think resolving this in a resilient manner will involve adding a now-type last action timestamp to the state that is updated when action requests (get,put,fold,index, etc.) are handled.
And having the continue/1 function use something like:
{next_state, active, State, max(1,State#state.inactivity_timeout - abs(timer:now_diff(os:timestamp(),State#state.last_action_timestamp)/1000)) }.
Instead of
{next_state, active, State, State#state.inactivity_timeout}.
This sounds like a case where it would be simplest to send a message using a timer with an explicit message using erlang:send_after/3
and then maybe manipulating the FSM state from the info message properly.
Or, I suppose another option would be to piggy-back on the management tick message.
The problem is easy to reproduce (I tested Riak 1.4.1 and Riak 2.1.1):
basho_bench
.)riak-admin transfers
. Select a node that has at least one partition to hand off to the stopped node. We will use this as the "source node".Make sure that
riak-admin vnode-status
is called on the source node at least once a minute, e.g. by starting the following loop:bin/riak-admin transfer
any time in the future (while the loop still running), it will show that partitions are still waiting to be handed off.The root cause is the following: Inside each Riak node, there is a
riak_core_vnode
process for each vnode (implemented in theriak_core_vnode.erl
module). This is agen_fsm
. If it doesn't receive a request for a minute, then it sends aninactive
event to theriak_core_vnode_manager
, which triggers any outstading handoffs. The problem is that runningriak-admin vnode-status
callsriak_kv_status:vnode_status
, which then dispatches a request towards theriak_core_vnode
process of each vnode. So ifriak-admin vnode-status
is called at least once a minute, theriak_core_vnode
processes will never have a one minute timeout, and will therefore never check if there are handoffs to be performed.A workaround is to call the
riak_core_vnode_manager:force_handoffs()
function on the source node, which will start a few handoffs (the number of handoffs will be the same as the value of thetransfer-limit
setting). If there are more handoffs than thetransfer-limit
, the function needs to be called multiple times to perform all handoffs.Notes:
riak-admin
commands don't have this problem, onlyvnode-status
.riak_core_vnode_manager
process has amanagement_tick
, a message with which itshandle_info
is called every 10 seconds. It checks and potentially starts ownership handoffs; but not hinted handoffs.