Closed haykh closed 4 months ago
Maybe related, I also encounter an MPI error in cartesian/minkowski SRPIC with the
wip/shock
setup:PMPI_Allgather(1000): MPI_Allgather(sbuf=0x490bc70, scount=1, MPI_FLOAT, rbuf=0x490bc70, rcount=1, MPI_FLOAT, MPI_COMM_WORLD) failed PMPI_Allgather(945).: Buffers must not be aliased [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=201926145 : system msg for write_line failure : Bad file descriptor Kokkos::Cuda ERROR: Failed to call Kokkos::Cuda::finalize() srun: error: midway3-0278: task 0: Exited with exit code 1
likely unrelated, this problem is simply wrong comm without errors.
@LudwigBoess is this a runtime or compile-time error? if runtime -- could you post the command you use to run? (or submit script) if compile-time, what MPI are you using?
CUDA with MPI is a bit of a headache to configure at first on a new machine. especially given the fact that different clusters have different env variables defined.
Culprit identified as potential race condition in src/kernels/injectors.hpp
-- kernels::NonUniformInjector_kernel::operator()
. Switching from Kokkos::atomic_fetch_add(&idx(), ppc)
to Kokkos::atomic_fetch_add(&idx(), 1)
solved the issue.
Maybe related, I also encounter an MPI error in cartesian/minkowski SRPIC with the
wip/shock
setup: