JohannesBuchner / PyMultiNest

Pythonic Bayesian inference and visualization for the MultiNest Nested Sampling Algorithm and PyCuba's cubature algorithms.
http://johannesbuchner.github.io/PyMultiNest/
Other
191 stars 87 forks source link

support optional env variable to skip any attempt to import MPI for Cray MPI #209

Open heather999 opened 2 years ago

heather999 commented 2 years ago

HI This is related to #113 and #173 I have a conda environment at NERSC with the cray mpich libraries and mpi4py available. I want to support users who may use the batch nodes as well as the login nodes and jupyterhub at NERSC (which effectively runs on their login nodes). In the case of running on a batch node, everything works fine but when running in this conda env on a NERSC login node and doing import pymultinest we receive the dreaded:

[Thu Apr  7 21:29:11 2022] [unknown] Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(537):
MPID_Init(246).......: channel initialization failed
MPID_Init(647).......:  PMI2 init failed: 1
Aborted

This is specific to Cray MPI since it fails looking for specific hardware that is not on NERSC's login nodes. I also reached out to the mpi4py developer and he kindly explained that the try block added in #173 doesn't help in this case, because error handles can only be set after MPI_Init(), so we cannot catch this fatal error. It would be helpful if we could add an environment variable that, if set, causes pymultinest to completely skip attempting the MPI import here.

I'm happy to help submit a PR to get that going. Does that seem reasonable?

JohannesBuchner commented 2 years ago

Further solutions for such interactive environments:

I am a bit hesitant to include more and more code which circumvents broken MPI setups.

heather999 commented 2 years ago

The point is to provide a single conda environment that supports both interactive use and running on batch nodes at HPC centers that have Cray MPI available. Uninstalling mpi4py, removing or hiding libmultinest_mpi.so are not reasonable solutions to allow a single conda environment to support all the potential use cases.

MPI is not meant to be used on the login nodes of these centers so I can understand why the setup is "broken", yet users do occasionally want to run python from a conda environment interactively and they should be able to do that. Another option is to maintain two separate conda environments, one that properly supports MPI and another that doesn't - but that's not a very user or environment maintainer friendly solution either.