conda-forge / esmpy-feedstock

A conda-smithy repository for esmpy.
BSD 3-Clause "New" or "Revised" License
6 stars 15 forks source link

Clarifying nompi vs mpi builds #70

Open billsacks opened 1 year ago

billsacks commented 1 year ago

Comment:

A user of the conda-forge esmpy recently opened a support ticket with the ESMF team because they were unable to write a NetCDF regridding weights file from the esmpy version they had installed because it was not built with PIO (the Parallel IO library) support. We found that the issue was that they were using a nompi version of the conda-forge esmpy package – and nompi implies no PIO, and thus no ability to write NetCDF regridding weight files (or to read certain NetCDF files).

I am adding documentation about this in the ESMPy documentation (https://github.com/esmf-org/esmf/pull/130), but I wondered if any notes should be made about this in the conda-forge documentation of this package? It looks like there are nompi as well as mpi builds of esmpy for any given architecture / OS. I'm not familiar enough with conda to understand when a user will get one vs. the other, but maybe this can be clarified?

xylar commented 1 year ago

Hi @billsacks,

A couple of things. It seems like that nompi version of esmf/esmpy has been pretty badly neutered in recent versions by the reliance on parallelio. That's unfortunate because the conda version of MPI doesn't work on most HPC.

Regarding documentation, conda-forge generates the readme for each feedstock automatically so we have 2 main recourses for documentation: comments in the recipe or issues like this one.

I'll add more shortly on how to select a given mpi version.

xylar commented 1 year ago

To get a "nompi" version of esmpy, a user should ideally install mambaforge or add mamba to their base environment and then create a new environment with:

mamba create -y -n esmpy-nompi "esmpy=*=nompi_*" python=3.11 <package> <package>
mamba activate esmpy-nompi

To create an mpich environment, run:

mamba create -y -n esmpy-mpich "esmpy=*=mpi_mpich_*" python=3.11 <package> <package>
mamba activate esmpy-mpich

and finally to create an openmpi environment, run:

mamba create -y -n esmpy-openmpi "esmpy=*=mpi_openmpi_*" python=3.11 <package> <package>
mamba activate esmpy-openmpi

The "build string" of each package built for different MPI flavors has a unique prefix used to select which MPI variant you want. This is true of all MPI packages on conda-forge.

billsacks commented 1 year ago

Thanks a lot for this information, @xylar ! It's particularly helpful to know that the nompi build is important for ESMPy. I will talk about this with the rest of the ESMF team and we'll see if we can prioritize getting the PIO-enabled build to work with nompi in a future release.

xylar commented 1 year ago

@billsacks, while we're discussing such things (and not to side track this thread), I thought I'd mention that one thing I would desperately want is for ESMF_RegridWeightGen to be a standalone utility without requiring all of ESMF. I frequently spend hours building ESMF for that one utility, and it doesn't feel like a good use of time and resources.

billsacks commented 1 year ago

Thanks for your suggestion. I will raise this with the ESMF team. My sense is that it isn't really feasible to build ESMF_RegridWeightGen without the rest of the ESMF library, since that utility depends on various parts of the library. However, we have recently been discussing whether it would be feasible to provide ESMF binaries via package managers. I'm not sure the priority / time-frame for doing that, but would that be a good solution for what you want? Or do you have something else in mind?

xylar commented 1 year ago

I have to use HPC system MPI so a pre-built package isn't feasible. I guess the pain is necessary. ESMF really takes ages to build.

billsacks commented 1 year ago

I talked with the ESMF team about some of the issues you raised.

As I thought, it isn't really feasible to build ESMF_RegridWeightGen without the rest of the ESMF library.

Long build times are typically seen on systems with slow file systems, such as Lustre file systems; this arises from ESMF's recursive build process and the way it traverses the file system. If the systems you're using have a part of the file system that has better disk / i/o performance, you will likely see significantly faster builds there. I know that this is a bit of an apples-to-oranges comparison, but to illustrate that the ESMF build isn't necessarily slow: I can build ESMF on my Mac laptop in about 7 minutes. Even on most HPC systems, a multi-hour build is definitely atypical.

xylar commented 1 year ago

Hi Bill, thanks for clarifying both of those points. I do some test builds where I'm not using a parallel file system to see if that improves my very long build times.

billsacks commented 8 months ago

@xylar and others: I have been working on enabling PIO in ESMF when using ESMF's mpiuni (i.e., fake mpi) library. This will allow ESMPy to do I/O when it doesn't have access to a real mpi library. This isn't yet totally complete, but I think I'm getting close - see https://github.com/esmf-org/esmf/pull/205. This is planned for the ESMF8.7 release, which is currently planned for around May of this year. Even though this isn't quite ready for public use yet, I wanted to call your attention to this work so we can discuss and plan for it in re-enabling nompi builds of ESMPy in conda. Let me know if you have any questions or want to talk further about this.

billsacks commented 1 month ago

@xylar - Starting with ESMF 8.7, which is currently anticipated to be released in late August or September, you will be able to build the internal PIO with mpiuni.

xylar commented 1 month ago

Thanks @billsacks! That will be much appreciated!