basho / riak_core

Distributed systems infrastructure used by Riak.
Apache License 2.0
1.23k stars 392 forks source link

riak_core_bg_manager race #936

Open ThomasArts opened 5 years ago

ThomasArts commented 5 years ago

I work on branch develop-3.0 but this issue seems not related to specifically that branch. Looking into the QuickCheck tests that actually fail, but are masked for failure by ?TRAPEXIT:

Possibly this is just an artefact on how the property is written

When calling riak_core_bg_manager:start() two ets tables are generated before actually starting:

-spec start_link() -> {ok, pid()} | ignore | {error, term}.
start_link() ->
    _ = maybe_create_ets(),
    gen_server:start_link({local, ?SERVER}, ?MODULE, [], []).

These ets tables are supposed to survive a crash of riak_core_bg_manager, such that after restart, these ets tables contain the process that hold locks.

The error I see in the QuickCheck tests is that aparently when a process monitored by the manager dies normally, it may not be able to find it in the ets table and causes a crash of the manager.

=ERROR REPORT==== 15-Mar-2019::13:17:52 ===
** Generic server riak_core_bg_manager terminating 
** Last message in was {'DOWN',#Ref<0.2179874273.4102029316.175607>,process,
                               <0.2248.0>,normal}
** When Server state == {state,background_mgr_info_table,
                               background_mgr_entry_table,true,false}
** Reason for termination == 
** {badarg,
       [{ets,match_object,[background_mgr_entry_table,{{given,'_'},'_'}],[]},
        {riak_core_bg_manager,release_resource,2,
            [{file,
                 "/Users/thomas/Quviq/.../riak_core/_build/eqc/lib/riak_core/src/riak_core_bg_manager.erl"},
             {line,707}]},
        {riak_core_bg_manager,handle_info,2,
            [{file,
                 "/Users/thomas/Quviq/.../riak_core/_build/eqc/lib/riak_core/src/riak_core_bg_manager.erl"},
             {line,505}]},
        {gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,616}]}, 
        {gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,686}]},
        {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}

I am going to dive deeper into this soon, but if people have seen this before or have suggestions, please let me know.