Closed maxpkatz closed 4 years ago
The effect is also not present (or at least mitigated) if castro.source_term_predictor=0
.
cuda-memcheck --tool racecheck
reports a race condition in the AMReX reduction code:
========= ERROR: Race reported between Read access at 0x00000590 in amrex_fort_module_amrex_reduce_add_device_
========= and Write access at 0x00000520 in amrex_fort_module_amrex_reduce_add_device_ [16384 hazards]
The AMReX reduction race condition has been resolved, but this effect is still present. It definitely goes away if Poisson gravity is not used. If we're doing Poisson gravity, then it is present even if we're fully periodic, which suggests an MLMG issue.
Where is inputs_collision
?
https://github.com/AMReX-Astro/Castro/files/3449821/inputs_collision.txt https://github.com/AMReX-Astro/Castro/files/3449822/probin_collision.txt
This can also be demonstrated with evrard_collapse
, using inputs.test amr.max_level=0 max_step=100
.
I can reproduce what you saw with evrag_collapse
on my desktop without MPI. Bit if I make the following change in Castro.
--- a/Source/gravity/Gravity.cpp
+++ b/Source/gravity/Gravity.cpp
@@ -1474,6 +1474,8 @@ Gravity::init_multipole_grav()
void
Gravity::fill_multipole_BCs(int crse_level, int fine_level, const Vector<MultiFab*>& Rhs, MultiFab& phi)
{
+ amrex::Gpu::LaunchSafeGuard lsg(false);
+
// Multipole BCs only make sense to construct if we are starting from the coarse level.
BL_ASSERT(crse_level == 0);
then it's deterministic (at least after 100 steps). I think the problem is ReduceSum in that function.
Currently the issue with the multipole BCs seems to be that there is substantial numerlcal sensitivity that can add up to very divergent outcomes when floating point roundoff error accumulates. For example, replacing
r**(-l-1)
with
r**(-1)
(when l == 0
) in ca_put_multipole_phi
results in O(1e-3) difference in the density after 100 steps of evrard_collapse, even for pure CPU code compiled with PGI.
Closing since this is not actually a code bug. We probably need to revisit this issue of numerical sensitivity in the boundary conditions, though.
Running wdmerger when compiled with CUDA on Summit shows non-determinism. If executed with
several times in a row (where inputs_collision comes from #646), once in a while the results will be substantially different (e.g. O(0.001) in fcompare). This effect seems to disappear (or at least be much more rare) when
gravity.max_multipole_order = 0
.