biviosoftware / home-env

MIT License
2 stars 2 forks source link

20221012 consult $LD_LIBRARY_PATH on start for NERSC #49

Closed robnagler closed 2 years ago

robnagler commented 2 years ago
> shifter --image=radiasoft/sirepo:alpha /bin/bash
bash-5.0$ echo $LD_LIBRARY_PATH
/opt/udiImage/modules/mpich/mpich-7.7.19/lib64:/opt/udiImage/modules/mpich/mpich-7.7.19/lib64/dep
bash-5.0$ exit
exit
> shifter --image=radiasoft/sirepo:alpha --entrypoint /bin/bash
bash-5.0$ echo $LD_LIBRARY_PATH
/opt/cray/pe/mpt/7.7.19/gni/mpich-gnu-abi/8.2/lib:/usr/lib64/mpich/lib:/home/vagrant/.local/lib:/opt/udiImage/modules/mpich/mpich-7.7.19/lib64:/opt/udiImage/modules/mpich/mpich-7.7.19/lib64/dep
bash-5.0$ exit
robnagler commented 2 years ago

The solution is not to set LD_LIBRARY_PATH in shifter if not running in mpi. From NERSC:

To disable this behavior on a login node, you'll want to disable the mpich module which will insert the Cray MPICH libraries:

> shifter --image=[registry.nersc.gov/library/nersc/h5py:3.4.0](http://registry.nersc.gov/library/nersc/h5py:3.4.0) --module=none python -c 'import mpi4py.MPI'

--module=none inserts the binaries and sets LD_LIBRARY_PATH. Just not setting LD_LIBRARY_PATH fixes the problem.

Original problem was caused by runtime being loaded when there was no MPI on the login nodes:

> shifter --image=[registry.nersc.gov/library/nersc/h5py:3.4.0](http://registry.nersc.gov/library/nersc/h5py:3.4.0) python -c 'import mpi4py.MPI'
[Wed Oct 12 10:51:39 2022] [unknown] Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(537):
MPID_Init(246).......: channel initialization failed
MPID_Init(647).......:  PMI2 init failed: 1
Aborted