NCAR / spack-gust

Spack production user software stack on the Gust test system
4 stars 0 forks source link

nvhpc/22.11 codes with cray-mpich MPI do not run #33

Closed vanderwb closed 1 year ago

vanderwb commented 1 year ago

Simple MPI hello world produces:

MPICH ERROR [Rank 0] [job id 6a935d73-52df-48ef-a9c5-286de545c7fd] [Thu Dec 15 09:21:26 2022] [gu0013] - Abort(1616271) (rank 0 in comm 0): Fatal error in PMPI_Init: Other
MPI error, error stack:
MPIR_Init_thread(170).......:
MPID_Init(501)..............:
MPIDI_OFI_mpi_init_hook(623):
open_fabric(1446)...........: OFI fi_getinfo() failed (ofi_init.c:1446:open_fabric:No data available)

aborting job:
Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(170).......:
MPID_Init(501)..............:
MPIDI_OFI_mpi_init_hook(623):
open_fabric(1446)...........: OFI fi_getinfo() failed (ofi_init.c:1446:open_fabric:No data available)
MPICH ERROR [Rank 0] [job id 6a935d73-52df-48ef-a9c5-286de545c7fd] [Thu Dec 15 09:21:26 2022] [gu0013] - Abort(1616271) (rank 0 in comm 0): Fatal error in PMPI_Init: Other
MPI error, error stack:
MPIR_Init_thread(170).......:
MPID_Init(501)..............:
MPIDI_OFI_mpi_init_hook(623):
open_fabric(1446)...........: OFI fi_getinfo() failed (ofi_init.c:1446:open_fabric:No data available)
...
vanderwb commented 1 year ago

@sjsprecious - not just GPU codes it seems. Hoping to ask an HPE person about this soon.

sjsprecious commented 1 year ago

Thanks @vanderwb. It does seem to be an incompatibility between latest nvhpc compiler and Cray MPI library as you indicated.

vanderwb commented 1 year ago

This was actually caused by the bind: true Spack setting - it should be avoided.