NOAA-EMC / hpc-stack

Create a software stack for HPC's
GNU Lesser General Public License v2.1
30 stars 36 forks source link

Question: installing hpc-stack on gaea #34

Closed climbfuji closed 3 years ago

climbfuji commented 4 years ago

I will need to install hpc-stack on gaea.

All modules generated by hpc-stack are lua modules, which gaea doesn't understand. gaea knows tcl modules, and so does every other system.

What is the strategy for supporting gaea (after all, it is a NOAA RDHPC system)?

The quick-and-dirty lua2tcl.py script I added to NCEPLIBS won't work out of the box for the hpc-stack modules I guess, and it's really only duct tape instead of doing it right.

aerorahul commented 4 years ago

@climbfuji I was able to build hpc-stack on Gaea without using modules. Please see the branch: feature/gaea

To build:

nohup ./build.sh -p <prefix> -c config/config_gaea.sh -y config/stack_gaea.yaml &
tail -f nohup.out

My build of hpc-stack on Gaea is at: /lustre/f2/dev/Rahul.Mahajan/opt/

climbfuji commented 4 years ago

@climbfuji I was able to build hpc-stack on Gaea without using modules. Please see the branch: feature/gaea

To build:

nohup ./build.sh -p <prefix> -c config/config_gaea.sh -y config/stack_gaea.yaml &
tail -f nohup.out

My build of hpc-stack on Gaea is at: /lustre/f2/dev/Rahul.Mahajan/opt/

Good to know, thanks for trying. We'll need modules though for building the applications that use hpc-stack. I asked the gaea admins if they can add lua module support on the machine, waiting to hear back.

aerorahul commented 4 years ago

It would be good to have module support, but we should atleast test the model with hpc-stack sans modules on Gaea to expose any issues.

aerorahul commented 4 years ago

@climbfuji I built UFS utils using my installation of HPC-stack on Gaea. A PR in ufs-utils repo is issued at https://github.com/NOAA-EMC/UFS_UTILS/pull/188 I also built the UFS (with S2S) using my installation of HPC-stack on Gaea. Attached is a git-diff

climbfuji commented 4 years ago

@climbfuji I built UFS utils using my installation of HPC-stack on Gaea. A PR in ufs-utils repo is issued at NOAA-EMC/UFS_UTILS#188 I also built the UFS (with S2S) using my installation of HPC-stack on Gaea. Attached is a git-diff

That's great. I already approved your ufs_utils PR. I'll wait for the gaea admins to come back to me before installing the stack following your branch and setting the environment variables as you did.

aerorahul commented 4 years ago

This exercise what just to make sure that we can build the UFS and its dependencies with the software from hpc-stack on Gaea. I am not the person maintaining the hpc-stack installation on Gaea.

climbfuji commented 4 years ago

But I am at the moment (at least I have been maintaining older software installations to build the ufs-weather-model). I'd be more than happy to pass this on to someone else, though.

On Oct 26, 2020, at 1:52 PM, Rahul Mahajan notifications@github.com wrote:

This exercise what just to make sure that we can build the UFS and its dependencies with the software from hpc-stack on Gaea. I am not the person maintaining the hpc-stack installation on Gaea.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/hpc-stack/issues/34#issuecomment-716785257, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5C2RODHXRX2HD6P4AY5S3SMXHRNANCNFSM4S45BLDA.

aerorahul commented 4 years ago

I figured it would be you. Since Gaea is not NOAA Tier-1 I doubt the EMC NCEPlibs team would be taking that over anytime soon.

climbfuji commented 4 years ago

I figured it would be you. Since Gaea is not NOAA Tier-1 I doubt the EMC NCEPlibs team would be taking that over anytime soon.

I know, and it's not a top priority to get off my list. Way more important would be to elevate jet to tier-1 and hand it over to EMC. Jet is the only system with multiple generations of CPUs on different partitions, i.e. the only system where we can test that no dangerous -xHOST compile options are used and that features like SIMDMULTIARCH in the ufs-weather-model work as expected. What is more, jet is used by many real-time parallels for both global and regional applications. I am trying to remind people up the chain of the importance of making this a tier-1 platform, but haven't been heard so far (or maybe I wasn't pestering them enough).

aerorahul commented 4 years ago

@climbfuji We built hpc-stack on Gaea without modules -- that worked. We were able to build UFS and UFS-utils with the hpc-stack (without modules) We asked Gaea sys-admins if they would install lmod -- they said no. You installed lmod on Gaea. Thanks! -- what was the result of using lmod and then installing hpc-stack? Will that work for us? Where do we go from here regarding Gaea?

climbfuji commented 4 years ago

I need to look where we installed it so far - it needs to go in a central location. Now that jet seems to be on its way to tier-1 and not managed by me, I can take care of gaea and use the existing shared folder for the UFS. Let me install the latest version of hpc-stack (or a tag?) in the right place if they are not there yet.

Each user wanting to use the lua modules will have to run one command manually before lua modulefiles can be used. We need to add this information somewhere in the wiki, and likely we need to add the lmod init command to a few places in the ufs-weather-model code, too. Let me try ...

edwardhartnett commented 3 years ago

What's the status of this issue? Is it still on-going?

If so, does it have to be resolved for the imminent 1.1.0 release of hpc-stack?

aerorahul commented 3 years ago

PR #70 will close this issue. It has passed review. Waiting on CI to merge.