basho / riak_pipe

Riak Pipelines
Apache License 2.0
162 stars 60 forks source link

PULSE test & fix riak_pipe_fitting #73

Closed beerriot closed 11 years ago

beerriot commented 11 years ago

In an attempt to locate the source of noproc messages in issues 48 and 49, I wrote the attached tests to exercise part riak_pipe_fitting under PULSE. The specific part I was targeting was the 'eoi' behavior encountered when no workers are active. To facilitate Riak KV's mapreduce need to evaluate a reduce phase even if no inputs are received, riak_pipe_fitting will evaluate the worker behavior in-process in this case.

The PULSE test found that in addition to a noproc exit when sending synchronous messages to a sink that does not exists, normal (or any other reason) exits may happen if the sink exits after the synchronous message has been delivered. The riak_pipe_sink:send_to_sink_fsm/5 function has been changed to catch these other exit reasons as well.

To see the tests fail, checkout the first (and second, if you wish) commits in this PR, and run ./rebar get-deps && make clean pulse They should fail with a reason like (partial Eunit output):

    {function_clause,
        [{reduce_fitting_pulse,exit_reason_is_normalish,
             [{normal,
                  {pulse_gen_fsm,sync_send_event,
                      [<0.700.0>,{p">>...

And may also include:

Exit Reasons: [{client,after_eoi},
               {sink,normal},
               {fitting,
                   {normal,
                       {pulse_gen_fsm,sync_send_event,
                           [<0.675.0>,
                            {pipe_eoi,#Ref<0.0.0.11121>},
                            infinity]}}},
               {builder,normal}]

And:

::error:{assertion_failed,[{module,reduce_fitting_pulse},
                         {line,299},
                         {expression,"eqc : quickcheck ( eqc : numtests ( 5000 , prop_fitting_dies_normal ( ) ) )"},
                         {expected,true},
                         {value,false}]}
  in function reduce_fitting_pulse:'-death_test_/0-fun-2-'/0 [test/reduce_fitting_pulse.erl:299]

If they instead fail for a reason like timeout, you may be missing the pulse_otp package, or other PULSE components.

To see the tests pass, aadd third (and fourth, if you wish) commit to your checkout, and make pulse again.

beerriot commented 11 years ago

Just a note: this test caught this problem killing the fitting process. However, it is also possible for the same issue to kill the vnode process. Any point in riak_pipe_vnode that says ?T_ERR or ?T(..., [error], ...) might try to send a synchronous message to the sink during a Riak KV MapReduce query. The fix included here fixes the problem for vnodes as well.

coderoshi commented 11 years ago

Pulsing as advertised +1