E3SM-Project / e3sm-unified

A metapackage for a unified anaconda environment for analyzing results from the Energy Exascale Earth System Model (E3SM).
BSD 3-Clause "New" or "Revised" License
8 stars 8 forks source link

Tempest-extreme in e3sm-unified #87

Closed chengzhuzhang closed 3 years ago

chengzhuzhang commented 3 years ago

Hi Xylar, I'm trying to call tempest-extreme from e3sm-unified and realized that it is not available. I think you have already made it a conda package on conda-forge. Could you make it available in the next e3sm-unified release?

xylar commented 3 years ago

@chengzhuzhang, I'm really sorry about that. I don't know how it was omitted. It is definitely on the list on Confluence (along with tempest-remap).

xylar commented 3 years ago

@chengzhuzhang, This will definitely be handed in #88 but I could also do a 1.4.1 release with tempest-extremes. That wouldn't be a lot of work. Let me know.

chengzhuzhang commented 3 years ago

hey Xylar, no problem. Thank you for working on it. There might be a tempest-extremes release coming up with a new feature for TC analysis. I will check with Paul for timing.

Also tempest-extremes can be built with or without MPI. For the MPI build, one has to be on a compute node to run. Could it be possible to maintain two builds on conda forge?

xylar commented 3 years ago

@chengzhuzhang,

Also tempest-extremes can be built with or without MPI. For the MPI build, one has to be on a compute node to run. Could it be possible to maintain two builds on conda forge?

I can certainly build the MPI version on conda-forge but it used the conda-forge version of MPICH (or Open-MPI) and that doesn't work on any of our supported machines. If you want a version that uses the system MPI, that obviously won't come from conda-forge and it won't be compatible with E3SM-Unified. This is also a problem with ILAMB.

xylar commented 3 years ago

Update, I did make the conda-forge package build with MPI, and I just updated the draft of 1.5.0 to also use the MPI version when E3SM-Unified has MPI. But the issue remains that the E3SM supported machines can't work with the conda-forge MPI. I have been working for more than a year to try to find a solution to this but I don't think there is one.

chengzhuzhang commented 3 years ago

This is problematic. I'm curious about the inclusion of tempest-remap. Do we have the same problem? I can try test a no MPI build of tempest-extreme to see if it performs adequately...

xylar commented 3 years ago

As far as I know, only mptempest (via moab) has MPI support. We do not yet include moab in E3SM-Unified so MPI support is also not available there.

xylar commented 3 years ago

This (MPI support for diagnostics tools) is a topic we really need to have a whole IG meeting about at some point.

xylar commented 3 years ago

One more thought. I have built ESMF and SCORPIO on many systems using system libraries. TempestExtremes seems much simpler than these libraries (its only dependency is libnetcdf). It seems possible to work on defining a system-specific E3SM-Unified that uses system builds of some tools like ESMF, TempestExtremes and ILAMB. This would work for executables but not for libraries. It would also be a ton of work, and probably won't be possible by July.

chengzhuzhang commented 3 years ago

One more thought. I have built ESMF and SCORPIO on many systems using system libraries. TempestExtremes seems much simpler than these libraries (its only dependency is libnetcdf). It seems possible to work on defining a system-specific E3SM-Unified that uses system builds of some tools like ESMF, TempestExtremes and ILAMB. This would work for executables but not for libraries. It would also be a ton of work, and probably won't be possible by July.

This sounds like a possible path forward for this MPI support issue. Would be nice to hear from someone from IG with relevant expertise.

xylar commented 3 years ago

Would be nice to hear from someone from IG with relevant expertise.

I would certainly welcome input from someone from IG with expertise in building packages like this for specific machines. But I think the bigger issue isn't so much how to build the libraries but how to maintain them as machine modules change. E3SM is constantly updating not only current branches but also maint branches to make sure the remain compatible with machine modules. I do not look forward to having to engage in a similar process for each past and present E3SM-Unified version on each supported machine. (But I also don't see a clear alternative.)

xylar commented 3 years ago

@chengzhuzhang, I've thought about this further and looked into a test build of TempestExtremes on Anvil. It seems like it shouldn't be too hard to replace the conda-forge builds of some MPI packages with our own builds for the system MPI:

I was afraid more packages were affected but those are the only ones that aren't libraries (and therefore already available as system modules).

ESMF is going to be the hardest one. I've built it on most machines but haven't had any luck on Cori-Haswell (and haven't tried on Cori-KNL because I haven't had good luck with analysis on those nodes), so that's one where we could use some IG team expertise.

I'll try to put together a test environment on several supported machines sometime in May because I think this might need some extra testing compared with our "normal" E3SM-Unified.