JCSDA / spack-stack

Creative Commons Zero v1.0 Universal
27 stars 46 forks source link

[INSTALL] SCOTCH v7.0.4 on RDHPCS machines #748

Closed MatthewMasarik-NOAA closed 1 year ago

MatthewMasarik-NOAA commented 1 year ago

Which software in the stack would you like installed? SCOTCH v7.0.4.

What is the version/tag of the software? v7.0.4: https://gitlab.inria.fr/scotch/scotch/-/tags/v7.0.4

What compilation options would you like set? This may be a work in progress.

Intel builds of SCOTCH have relied on also module loading gcc to reference newer headers. For hera using hpc-stack I used the following modules

module load cmake/3.20.1
module load intel/2022.1.2
module load impi/2022.1.2
module use  /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/intel-2022.1.2/modulefiles/stack
module load hpc/1.2.0
module load hpc-intel/2022.1.2
module load hpc-impi/2022.1.2
module load gnu/9.2.0

which builds scotch/7.0.4 successfully. Attempting to get the analogous spack-stack environment I did this

module use /scratch1/NCEPDEV/jcsda/jedipara/spack-stack/modulefiles
module load miniconda/3.9.12
module load ecflow/5.5.3
module load mysql/8.0.31

module use /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.4.1/envs/unified-env/install/modulefiles/Core
module load stack-intel/2021.5.0
module load stack-intel-oneapi-mpi/2021.5.1
module load stack-python/3.9.12

module load cmake/3.23.1
module use /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.4.0/envs/unified-env/install/modulefiles/Core
module load stack-gcc/9.2.0

which gives an Lmod error trying to load stack-gcc when stack-intel is already loaded.

I'm not sure how to get around this, so I am curious if there's any suggestions how to handle this? I also wanted to ask for scotch that has a recipe already for 7.0.3, and this new release is just a bugfix for 7.0.3, could you advise what I should fill in for instructions at this point? I'm just thinking of avoiding redundant information if that would be the case..

Installation timeframe: Would you like this package to be installed in an upcoming quarterly spack-stack release, or sooner? August quarterly release.

Any other relevant information that we should know to correctly install the software?? There is currently a recipe for scotch/7.0.3.

Additional context NA

climbfuji commented 1 year ago

The Intel-GNU issues described don't apply to spack-stack because of the way we set up the compilers.

MatthewMasarik-NOAA commented 1 year ago

The Intel-GNU issues described don't apply to spack-stack because of the way we set up the compilers.

Okay. Just to be sure I'm clear, for the Intel build I would not load stack-gcc/9.2.0 in this case?

climbfuji commented 1 year ago

Correct. For the example of Hera, this is essentially because of lines 15-17 in https://github.com/JCSDA/spack-stack/blob/develop/configs/sites/hera/compilers.yaml which then gets baked into the Intel compiler meta module automatically.

MatthewMasarik-NOAA commented 1 year ago

Okay, I see. That's very helpful, thanks @climbfuji.

Regarding the build instructions, will the 7.0.3 instructions suffice, or should more be added?

climbfuji commented 1 year ago

According to @AlexanderRichert-NOAA the 7.0.3 instructions are sufficient.

MatthewMasarik-NOAA commented 1 year ago

Great. thank you.

AlexanderRichert-NOAA commented 1 year ago

Indeed. To the best of my understanding, 7.0.4 doesn't affect build options since it's just various bug fixes within the compiled code.

MatthewMasarik-NOAA commented 1 year ago

You're correct. It is just the bugfixes for the scaling related issue, as well as the openmpi related issue you found.

climbfuji commented 1 year ago

scotch@7.0.4 has been installed on all RDHPCS systems as part of spack-stack-1.5.0.

Ok to close this or do you want to wait until the last UFS application has migrated to spack-stack (hopefully in this decade)?

MatthewMasarik-NOAA commented 1 year ago

Hi @climbfuji, that's great news! I could test WW3 on each platform to confirm from my end, and then close the issue. Does that sound good?

climbfuji commented 1 year ago

Yes, thanks. Note that there is a PR for the ufs-weather-model to update to spack-stack-1.5.0, which includes the scotch update: https://github.com/ufs-community/ufs-weather-model/pull/1920

JessicaMeixner-NOAA commented 1 year ago

I'm having trouble loading modules on orion with spack-stack-1.5.0 that is used in ufs-community/ufs-weather-model#1920 is that expected ?

climbfuji commented 1 year ago

Can you be a little more specific please?

JessicaMeixner-NOAA commented 1 year ago

Here's a quick way to reproduce what I'm seeing:

git clone https://github.com/climbfuji/ufs-weather-model test
cd test 
git checkout feature/spack_stack_150
cd modulefiles/
module use `pwd`
module load ufs_orion.intel 

And then things just hangs... I'm trying to update the ufs-weather-model version in global-workflow so that I can use scotch 7.0.4 there as well as for other reasons.

JessicaMeixner-NOAA commented 1 year ago

I have tried multiple log-in nodes. I haven not tried /work instead of /work2, I'll try that now.

climbfuji commented 1 year ago

I wonder ... on gaea c5 I think we used an option -t because lmod would just hang. module load -t .... But can you check if you have some other modulepaths added to the default login environment / some modules loaded automatically? We recommend a clean .bashrc / .profile, no user mods.

JessicaMeixner-NOAA commented 1 year ago

I only load: module load contrib noaatools I try to keep a clean bashrc usually but I do have these on orion. I'll see if it helps to clean that out. I have tried module purges, it doesnt' seem to help and /work did not help and thank you for your quick response and help @climbfuji !

JessicaMeixner-NOAA commented 1 year ago

A clean environment did not help. module load -t ufs_orion.intel also did not help.

Not sure if this is a "me" issue or if others are having the same problem or not.

JessicaMeixner-NOAA commented 1 year ago

I did double check and I have no issues with the versions in the develop of the ufs-weather-model

AlexanderRichert-NOAA commented 1 year ago

Looks like the ufs_orion.intel.lua has the wrong MODULEPATH? It's using /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.5.0/envs/unified-env so I think it's hanging because it's searching the whole installation directory structure for module files. When I add /install/modulefiles/Core and change the stack-python version to 3.10.8, I can get it to load for me.

JessicaMeixner-NOAA commented 1 year ago

@AlexanderRichert-NOAA thank you!!!! I made the changes you described and can now load the modules.

climbfuji commented 1 year ago

Good catch, thanks Alex! I need to fix this in my branch.

On Oct 3, 2023, at 10:21 AM, Jessica Meixner @.***> wrote:

@AlexanderRichert-NOAA https://github.com/AlexanderRichert-NOAA thank you!!!! I made the changes you described and can now load the modules.

— Reply to this email directly, view it on GitHub https://github.com/JCSDA/spack-stack/issues/748#issuecomment-1745316667, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5C2RLSSBLKIWHAKIGAHPDX5Q3RHAVCNFSM6AAAAAA4B4225KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBVGMYTMNRWG4. You are receiving this because you were mentioned.

climbfuji commented 1 year ago

I fixed this and also updated my branch from develop while I was doing that.

On Oct 3, 2023, at 10:25 AM, Dom Heinzeller @.***> wrote:

Good catch, thanks Alex! I need to fix this in my branch.

On Oct 3, 2023, at 10:21 AM, Jessica Meixner @. @.>> wrote:

@AlexanderRichert-NOAA https://github.com/AlexanderRichert-NOAA thank you!!!! I made the changes you described and can now load the modules.

— Reply to this email directly, view it on GitHub https://github.com/JCSDA/spack-stack/issues/748#issuecomment-1745316667, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5C2RLSSBLKIWHAKIGAHPDX5Q3RHAVCNFSM6AAAAAA4B4225KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBVGMYTMNRWG4. You are receiving this because you were mentioned.

climbfuji commented 1 year ago

Closing this as completed. scotch@7.0.4 is installed everywhere as part of 1.5.0. Pleas report issues with spack-wtsack-1.5.0 separately here and/or in the ufs-weather-model PR that updates it to 1.5.0. Thanks!

MatthewMasarik-NOAA commented 1 year ago

Hi @climbfuji, sounds good, thank you. Ps, I'm virtually attending a wave workshop all this week, so my testing has been affected by this. I'll pick up next week again, and report any issues if they arise. Thanks for your work completing this!