Open slfritchie opened 8 years ago
More info, when running with the SASL app running & logging errors.
=CRASH REPORT==== 30-Oct-2015::21:41:22 ===
crasher:
initial call: machi_cr_client:init/1
pid: <0.1933.4>
registered_name: []
exception exit: {{badmatch,[]},
[{machi_cr_client,do_append_head2,7,
[{file,"src/machi_cr_client.erl"},{line,330}]},
{gen_server,try_handle_call,4,
[{file,"gen_server.erl"},{line,607}]},
{gen_server,handle_msg,5,
[{file,"gen_server.erl"},{line,639}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,237}]}]}
in function gen_server:terminate/7 (gen_server.erl, line 804)
[....]
neighbour: [{pid,<0.182.0>}, %%% SLF note that this is a really "old" PID compared to the crasher
{registered_name,[]},
{initial_call,{erlang,apply,2}},
{current_function,{gen,do_call,4}},
{ancestors,[]},
{messages,
[{#Ref<0.0.0.8917>,{error,partition}},
[... a few hundred messages more ...]
{links,
[<0.31733.1>,<0.15066.2>,<0.22243.3>,<0.28288.3>,
<0.31857.3>,<0.4687.4>,<0.4779.4>,<0.4867.4>,<0.4907.4>,
[... very roughly, about 100,000 other process links ...]
{dictionary,
[{{memo,eqc_statem,non_interfering,
{machi_ap_repair_eqc,
{state,4,
{1445,935441,287549},
false,
[a,b,c,d],
[{a,a_chmgr},{b,b_chmgr},{c,c_chmgr},{d,d_chmgr}],
[{a,<0.8424.1>},
{b,<0.8425.1>},
{c,<0.8427.1>},
{d,<0.8429.1>}],
[{a,<0.8430.1>},
{b,<0.8437.1>},
{c,<0.8444.1>},
{d,<0.8451.1>},
{a,<0.8458.1>},
{b,<0.8465.1>},
{c,<0.8472.1>},
{d,<0.8479.1>},
{a,<0.8486.1>},
{b,<0.8493.1>},
{c,<0.8500.1>},
{d,<0.8507.1>}]},
[... are really super enormous process dictionary continues ...]
At the start of each QuickCheck test case, I check the # of procs and the proc limit:
.process_count = 3563 of 262144,
.process_count = 3544 of 262144,
.process_count = 4792 of 262144,
[...]
.process_count = 20874 of 262144,
.process_count = 21020 of 262144,
.process_count = 20879 of 262144,
.process_count = 20950 of 262144,
.process_count = 21060 of 262144,
.process_count = 21308 of 262144,
[...]
One apparent point of process leak is that I forgot to stop some client processes in #state
of EQC side :see_no_evil:
When I was at dinner, running
env EQC_TIMEOUT=3600 rebar skip_deps=true -v eunit suites=machi_ap_repair_eqc tests=prop_repair_par_test_
on the commit b5005c35263e79389c42b1808fce1171b44f4fb3 (branchss-repair-with-partition-simulator
) with the https://gist.github.com/slfritchie/12e40859a08d5e4a678a patch applied, I saw this when I returned.At a minimum, the machi_cr_client should do something less silly when the chain is not available.
Also, there may be a resource leak (e.g. file descriptor) that caused this test to fail after about 52 iterations (estimate, based on test output).