Exawind / nalu-wind

Solver for wind farm simulations targeting exascale computational platforms
https://exawind.github.io/nalu-wind/
Other
124 stars 85 forks source link

ExaWind / Nalu-Wind Stable Release or Build Recipes #866

Closed svdavidson closed 1 month ago

svdavidson commented 3 years ago

This issue is a request for instructions or a recipe or a discussion on how to build a (somewhat) stable (pre-) release of Exawind / Nalu-Wind that can be used for scientific research.

I am a software engineer helping a major energy company to do research for large Wind-based projects. My goal is to build a stable version of Nalu-Wind so our scientists can start doing research and move these critical wind projects forward. While Nalu-Wind is certainly still in active development, I would ask if the time is right to start transitioning to getting more users and provide some kind of a stable release or some stable build recipes for some common platforms. I have already done some work building Nalu-Wind. I originally tried to use the documentation provided at https://nalu-wind.readthedocs.io/en/latest/source/user/build_manually.html and found this to be outdated and not maintained. I have tried to use the latest releases of Spack to build Nalu-Wind and dependencies as well as the Nalu-Wind build reports at https://my.cdash.org/index.php?project=Nalu-Wind. I also found the ExaWind-Builder scripts and was able to build an x86_64 version of Nalu-Wind with the desired options. The main issue I found with ExaWind-Builder scripts and Spack is that they use the development and/or master git branches for much of the key components such as Nalu-Wind, Trilinos, OpenFAST, TIOGA, ... The git branches change daily, which makes sense since it is for development, but it's not generally possible to build a stable release as the majority of the CTests fail.

We have discussed this with the NREL PI for ExaWind, who (based on my understanding) suggested that the ExaWind project is transitioning from being fully focused on development to starting to work with users and suggested that GitHub would be the appropriate place to talk about how best to make that transition and to perhaps contribute as an early adopter and user.

I request that this issue be used as a discussion of how best to proceed in providing a usable Nalu-Wind release for Wind research. Does something already exist? Should stable branches be created? Should tags be used to identify a somewhat stable recommended release? Should Spack be updated with a stable release recipe? Is there a more appropriate approach?

If a GitHub issue is not the appropriate vehicle or method for this discussion, please recommend an alternative.

Sincerely,

Shannon Davidson Software Engineer, HPC Engineering

psakievich commented 3 years ago

@svdavidson you've raised several excellent points. @jrood-nrel and I are actively working on improving the build process and we should update the docs accordingly. We've recently had several discussions regarding this issue. For a little more clarification, what level of use are your scientists looking at? Would pre-built binaries be a preferred distribution method for your team to be able to run with, or is the desire principally related to building the code?

svdavidson commented 3 years ago

Hi @psakievich, pre-built binaries (or eventually even signed containers) for "supported" architectures and "recommended" configurations (i.e. dependencies and versions) would be a great long term objective, but having a working build recipe for a "stable" or "tested" release for the architectures and configurations that the ExaWind/Nalu-Wind team is focusing on would be a preferred place to start (and our first objective) as we also want to gain the knowledge of building and working with the different dependencies and technologies. We have access to similar architectures as DoE HPC systems - x86_64, x86_64+NVIDIA-GPU, Power9+NVIDIA-GPU, as well as a few other new architectures coming on board.

psakievich commented 3 years ago

@svdavidson thanks. Yes you've hit on something important which is freezing the TPL's for a release. As you've noted we are currently running off the develop branch of almost all of our TPL's. So first we will first have to organize this process before we can provide stable release tags. Containers are a longer term objective that is on our radar. We have a partnership with the E4S that provides cached binaries and can do so for a wide range of architectures and operating systems via spack. They actually use nalu-wind for their tutorials so using their caches could be an intermediate step that provides some immediate relief to build frustrations if needed. We just need to identify the exact OS, compilers etc to give them to make the cache available.

However, this doesn't give users an ability to reproduce the build in question easily like a stable release as you point out.

In terms of how to go about deploying a stable release, I think the list of questions you provided in the first post walks down the right path. We are pretty focused on spack. So I would envision the process being something like:

  1. Identify a candidate commit for nalu-wind and confirm the tests pass for release versions of the major dependencies (Tioga, Hypre, Trilinos, OpenFAST etc)
  2. Tag this version on github as a release with the TPL release versions listed
  3. Update the spack package.yaml in the builtin spack repo to include the release tag
  4. Provide reference spack environments that can be used to build the release version
  5. Enable binaries on target OS/compiler combinations via E4S partnership (they would use our reference spack env's)

Eventually it would be nice to have step 5 include binaries people can download from our github account and/or development of containers. @jrood-nrel what are your thoughts on this? I can start iterating on step 1 as we resolve the current dashboard failures from openfast. It might be good to tag a release before moving the nightly tests back to develop with openfast.

svdavidson commented 3 years ago

Hi @psakievich and @jrood-nrel. Has there been any progress toward deploying a stable release? Building NaluWind along with the dependencies continues to be a challenge. In the meantime, I was wondering if there is a recommended way to build. Or perhaps, what would be the most commonly used build method for developers before testing? Is Spack the way to go, or Exawind-Builder, or checking out the Nalu-Wind repo and building manually with a cmake command similar to those reported in the Nalu-Wind CDash Dashboard build details?

psakievich commented 3 years ago

@svdavidson we have been reworking this quite a bit over the last couple of months. @jrood-nrel and I are pushing everything we can into spack. We have a project going to assist with Exawind specific building https://github.com/psakievich/spack-manager which you can check out. It is essentially a way to extend spack for specific exawind needs. There is a tutorial on a developer workflow. Most of this is in the alpha phase, but the intent of this project is to also make it easy to create modules, and build caches for analysts and developers. The plan is to phase out of Exawind-Builder and transition developers to this tool in the coming months. If you'd like to get started and run into trouble then feel free to reach out to us and we can set up a meeting to address any needs you may have. You can also use github issues/pull-requests for spack-manager.

In terms of what the actual stable release will look like, we haven't worked out the exact mechanics yet, but I think the leading contender is version controlling the dependencies inside the Exawind spack packages.

svdavidson commented 3 years ago

Thanks @psakievich. This sounds like a good approach. I'll check out the Spack Manager project and let you know if I run into any issues.

svdavidson commented 2 years ago

Hi @psakievich. I have been using Spack Manager to deploy Nalu-Wind for our researchers. It works but is not ideal for deploying shared binaries or for recreating a common tested release of binaries at a later date. Has there been any progress toward providing any kind of tagged release to make it easier to build a tested Nalu-Wind version?

psakievich commented 2 years ago

@svdavidson yes. I am currently working on a way to deploy shared binaries for the developer work flow. This will allow us to create pooled version of the TPL's. See https://github.com/psakievich/spack-manager/pull/84 to see the progress. @jrood-nrel and I are planning to give a presentation on this and other features at the start of the new year. Please let us know if you'd like an invite to that.

In terms of deploying binaries for analyst/end users, I've been doing this with spack generated modules with some success on the SNL machines. The plan is to codify this in the next month and make a way to generate automated modules, and the associated views where we are stashing binaries. This is the next step after I finish the PR linked to above.

@jrood-nrel has also been working on a functionality to automatically add local gold files for the regression tests so you can run the test suite with tight tolerances. To ensure all the tests pass before installing (say for a build that will become a system module) you can run with spack install --test root and it will only install if all the tests pass. We have not gotten to that point yet in our work, but it is coming in the nearish future.

The vision is we will have a series of time stamped builds, and modules that can live on each machine. This is not a formal release just yet, but rather just a tested time history.

svdavidson commented 2 years ago

Hi @psakievich. Thanks for the update on deploying shared binaries and automated regression tests. This sounds like good progress and I would certainly appreciate an invite to the presentation at the start of the new year.

psakievich commented 2 years ago

@svdavidson It's been a while since we've updated this issue. We now have nightly snapshots of the entire exawind stack that are stored on dockerhub.

https://hub.docker.com/r/ecpe4s/exawind-snapshot

We have also introduced a trilinos@stable that is fixed to a specific trilinos commit through spack-manager. This has greatly stabilized the stack for common use cases and feature development.

svdavidson commented 2 years ago

Thanks for the update @psakievich. I'll check out the nightly snapshots and utilizing trilinos@stable.

marchdf commented 1 month ago

Closing for being out of date. We have https://github.com/Exawind/exawind-manager and the containers. Please reopen if this isn't working.