Closed robnagler closed 2 years ago
The solution is not to set LD_LIBRARY_PATH in shifter if not running in mpi. From NERSC:
To disable this behavior on a login node, you'll want to disable the mpich module which will insert the Cray MPICH libraries:
> shifter --image=[registry.nersc.gov/library/nersc/h5py:3.4.0](http://registry.nersc.gov/library/nersc/h5py:3.4.0) --module=none python -c 'import mpi4py.MPI'
--module=none
inserts the binaries and sets LD_LIBRARY_PATH. Just not setting LD_LIBRARY_PATH fixes the problem.
Original problem was caused by runtime being loaded when there was no MPI on the login nodes:
> shifter --image=[registry.nersc.gov/library/nersc/h5py:3.4.0](http://registry.nersc.gov/library/nersc/h5py:3.4.0) python -c 'import mpi4py.MPI'
[Wed Oct 12 10:51:39 2022] [unknown] Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(537):
MPID_Init(246).......: channel initialization failed
MPID_Init(647).......: PMI2 init failed: 1
Aborted