JuliaParallel / MPI.jl

MPI wrappers for Julia
https://juliaparallel.org/MPI.jl/
The Unlicense
376 stars 122 forks source link

[Spectrum MPI] Crash in IProbe #389

Open vchuravy opened 4 years ago

vchuravy commented 4 years ago

Unsure if this is a CLIMA bug, MPI.jl bug or Spectrum MPI bug.

cc: @lcw

julia: /__SMPI_build_dir______________________________/ibmsrc/pami/ibm-pami/buildtools/pami_build_port/../pami/components/devices/shmem/shaddr/CMAShaddr.h:164: size_t PAMI::Device::Shmem::CMAShaddr::read_impl(PAMI::Memregion*, size_t, PAMI::Memregion*, size_t, size_t, bool*): Assertion `cbytes > 0' failed.

signal (6): Aborted
in expression starting at /nobackup/users/vchuravy/ClimateMachine/experiments/AtmosLES/dycoms.jl:399
gsignal at /lib64/libc.so.6 (unknown line)
abort at /lib64/libc.so.6 (unknown line)
__assert_fail_base at /lib64/libc.so.6 (unknown line)
__assert_fail at /lib64/libc.so.6 (unknown line)
_ZN4PAMI8Protocol3Get7GetRdmaINS_6Device5Shmem8DmaModelINS3_11ShmemDeviceINS_4Fifo8WrapFifoINS7_10FifoPacketILj64ELj4096EEENS_7Counter15IndirectBoundedINS_6Atomic12NativeAtomicEEELj256EEENSB_8IndirectINSB_6NativeEEENS4_9CMAShaddrELj256ELj512EEELb0EEESL_E6simpleEP18pami_rget_simple_t at /opt/ibm/spectrum_mpi/lib/pami_port/libpami.so.3 (unknown line)
_ZN4PAMI8Protocol3Get13CompositeRGetINS1_4RGetES3_E6simpleEP18pami_rget_simple_t at /opt/ibm/spectrum_mpi/lib/pami_port/libpami.so.3 (unknown line)
_ZN4PAMI7Context9rget_implEP18pami_rget_simple_t at /opt/ibm/spectrum_mpi/lib/pami_port/libpami.so.3 (unknown line)
PAMI_Rget at /opt/ibm/spectrum_mpi/lib/pami_port/libpami.so.3 (unknown line)
process_rndv_msg at /opt/ibm/spectrum_mpi/lib/spectrum_mpi/mca_pml_pami.so (unknown line)
pml_pami_recv_rndv_cb at /opt/ibm/spectrum_mpi/lib/spectrum_mpi/mca_pml_pami.so (unknown line)
_ZN4PAMI8Protocol4Send11EagerSimpleINS_6Device5Shmem11PacketModelINS3_11ShmemDeviceINS_4Fifo8WrapFifoINS7_10FifoPacketILj64ELj4096EEENS_7Counter15IndirectBoundedINS_6Atomic12NativeAtomicEEELj256EEENSB_8IndirectINSB_6NativeEEENS4_9CMAShaddrELj256ELj512EEEEELNS1_15configuration_tE5EE15dispatch_packedEPvSP_mSP_SP_ at /opt/ibm/spectrum_mpi/lib/pami_port/libpami.so.3 (unknown line)
PAMI_Context_advancev at /opt/ibm/spectrum_mpi/lib/pami_port/libpami.so.3 (unknown line)
mca_pml_pami_progress at /opt/ibm/spectrum_mpi/lib/spectrum_mpi/mca_pml_pami.so (unknown line)
opal_progress at /opt/ibm/spectrum_mpi/lib/libopen-pal.so.3 (unknown line)
mca_pml_pami_probe_start at /opt/ibm/spectrum_mpi/lib/spectrum_mpi/mca_pml_pami.so (unknown line)
mca_pml_pami_iprobe at /opt/ibm/spectrum_mpi/lib/spectrum_mpi/mca_pml_pami.so (unknown line)
PMPI_Iprobe at /opt/ibm/spectrum_mpi/lib/libmpi_ibm.so (unknown line)
Iprobe at /nobackup/users/vchuravy/julia_depot/packages/MPI/2F5CB/src/pointtopoint.jl:141 [inlined]
iprobe_and_yield at /nobackup/users/vchuravy/ClimateMachine/src/Arrays/MPIStateArrays.jl:444
jl_fptr_args at /home/software/julia/src/julia-1.3.0/src/gf.c:1915
Allocations: 209544203 (Pool: 209507420; Big: 36783); GC: 157
lcw commented 4 years ago

A crash in Iprobe, weird.