E3SM-Project / e3sm-unified

A metapackage for a unified anaconda environment for analyzing results from the Energy Exascale Earth System Model (E3SM).
BSD 3-Clause "New" or "Revised" License
8 stars 8 forks source link

mpi4py on HPC #54

Closed xylar closed 5 years ago

xylar commented 5 years ago

mpi4py (used by ilamb) doesn’t work properly on cori and is not likely performant on other HPC.

xylar commented 5 years ago

@jhkennedy, I’ve been in discussion with Min about a workaround but worth discussing between the two of us, too.

xylar commented 5 years ago

I'm following these instructions at NERSC: https://docs.nersc.gov/programming/high-level-environments/python/mpi4py/#mpi4py-in-your-custom-conda-environment

xylar commented 5 years ago

I also tested on compy with:

module load gcc/4.8.5
module load mvapich2/2.3.1

The resulting environment with mpi4py appears to work for calls to mpirun but not to srun. @rljacob, do you know if/under what conditions srun works on compy?

xylar commented 5 years ago

I haven't tested yet, but I think the system MPI and the conda installation of MPICH aren't going to play nice with one another. The esmf package depends on mpich and is a dependency of nco so we're going to be in a bit of trouble. Things might work at NERSC via srun (which uses system MPI) for mpi4py calls and mpirun (which will use the conda mpich) for esmf calls. But, given that srun didn't work for me on compy, we might be in trouble there.

xylar commented 5 years ago

I believe I have a solution. Testing will be needed.

I build a serial version of esmf that I will upload to the e3sm anaconda channel once I've tested.

I have updated the build script to include building mpi4py with native MPI on cori, compy, anvil, cooley and grizzly. I won't do this on rhea or acme1 unless it is requested.

xylar commented 5 years ago

So the solution I came up with doesn't seem to work for esmpy (and therefore maybe not for many of our packages, thought e3sm_diags is the only one where I'm sure). I'll explore more for the next release...