Closed MaxThevenet closed 4 years ago
I observed that the segfault happens only when running with more than 1 MPI rank. Do you confirm that?
I agree, good point! Also, it doesn't seem to occur in 2d, but I'm not sure about it.
It seems to me that the segfault is not caused by the call to the function coarsenable
, but rather by the initialization
Array4<Real const> const& arr_src = mf_src.const_array( mfi );
right before the ParallelFor
inside Average::CoarsenAndInterpolateLoop
. In other words, commenting out the initialization above and the subsequent ParallelFor
(as it depends on the initialization of arr_src
), the code seems to run without segfault.
@RevathiJambunathan Did you mention that by changing the input parameter warpx.load_balance_int
in the input file Examples/Tests/reduced_diags/inputs_loadbalancecosts
from warpx.load_balance_int=2
to warpx.load_balance_int=1
you get segfault after 1 iteration instead of 3? I don't observe the same behavior at the moment: if warpx.load_balance_int=1
, I still get segfault after the third step. Did I maybe misunderstand what you are observing?
@EZoni the segfault probably occurs at the first dump iteration after a LB iteration. I think @RevathiJambunathan and @WeiqunZhang identified the bug, with @mrowan137's help @RevathiJambunathan is currently working on a fix.
@EZoni sorry -- I was not fully clear when I talked about it this morning. I also changed diag.period = 1 and played with load_balance_int=1,2,3,4 to confirm if that was the cause for error.
This issue was fixed in #943. The test reduced_diags_loadbalancecosts_timers
, that was crashing in #933 before the fix in #943 was merged, runs successfully now.
Automated test
reduced_diags_loadbalancecosts_timers
on PR #933 fails due to a segfault inat the
coarsenable
call in the ASSERT line belowWhen adding
right before the ASSERT line, the code returns
I do not see what is wrong in this BoxArray or in the coarsening ratio, so I don't get why
coarsenable
fails so far.To reproduce the issue, I executed the code with
and the error is random, so far either a segfault or (maybe more helpful)
Here's the Backtrace generated.
inputs_loadbalancecosts