JCSDA / spack-stack

Creative Commons Zero v1.0 Universal
24 stars 44 forks source link

[INSTALL]: g2 v3.5.1 and g2tmpl v1.13.0 for spack-stack-1.8.0 #1180

Closed WenMeng-NOAA closed 1 month ago

WenMeng-NOAA commented 2 months ago

Package name

g2 and g2tmpl

Package version/tag

3.5.1 for g2; 1.13.0 for g2tmpl

Build options

None

Installation timeframe

For GEFS v13 development, the UPP updates require g2 v3.5.1 and g2tmpl v1.13.0 installations.

Other information

No response

WCOSS2

WCOSS2: General questions

No response

WCOSS2: Installation and testing

No response

WCOSS2: Technical & security review list

WCOSS2: Additional comments

No response

climbfuji commented 2 months ago

We are limiting the number of installations/modifications for existing spack-stack releases. Also, as far as I know the UFS applications are going to skip spack-stack-1.7.0 and go straight to 1.8.0 (from 1.6.0). Thus, we will include this update in the spack-stack-1.8.0 release, but not in the already installed 1.7.0 release.

AndrewBenjamin-NOAA commented 2 months ago

@climbfuji How will this impact the timing of the installs of the g2 and g2tmpl library on RDHPCS machines?

@Hang-Lei-NOAA have you contacted anyone from the SPA team at NCO yet regarding the WCOSS2 installation?

My concern is giving @WenMeng-NOAA enough time to update and properly test UPP prior to the GEFS code freeze.

climbfuji commented 2 months ago

@climbfuji How will this impact the timing of the installs of the g2 and g2tmpl library on RDHPCS machines?

@Hang-Lei-NOAA have you contacted anyone from the SPA team at NCO yet regarding the WCOSS2 installation?

My concern is giving @WenMeng-NOAA enough time to update and properly test UPP prior to the GEFS code freeze.

If time for testing is a concern, then it's possible to request a test install on a single RDHPCS platform before the spack-stack-1.8.0 release. It's simply not feasible to amend existing spack-stack installations on all systems multiple times per week, and right now we get hammered with such requests from NOAA.

Hang-Lei-NOAA commented 2 months ago

@Andrew Benjamin - NOAA Federal @.***> Following our procedure, I will install them and let modelers testing them on wcoss2 acorn. And then deliver them, due to the NCO rule that installations on wcoss2 cannot be changed. Since the Acorn is down for days. I am figuring out a solution on dogwoods for a personal testing with UPP developers.

On Tue, Jul 9, 2024 at 9:43 AM Dom Heinzeller @.***> wrote:

@climbfuji https://github.com/climbfuji How will this impact the timing of the installs of the g2 and g2tmpl library on RDHPCS machines?

@Hang-Lei-NOAA https://github.com/Hang-Lei-NOAA have you contacted anyone from the SPA team at NCO yet regarding the WCOSS2 installation?

My concern is giving @WenMeng-NOAA https://github.com/WenMeng-NOAA enough time to update and properly test UPP prior to the GEFS code freeze.

If time for testing is a concern, then it's possible to request a test install on a single RDHPCS platform before the spack-stack-1.8.0 release. It's simply not feasible to amend existing spack-stack installations on all systems multiple times per week, and right now we get hammered with such requests from NOAA.

— Reply to this email directly, view it on GitHub https://github.com/JCSDA/spack-stack/issues/1180#issuecomment-2217784032, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFC635TKJCTEKZ4DLFDZLPSHPAVCNFSM6AAAAABKQYNGGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJXG44DIMBTGI . You are receiving this because you were mentioned.Message ID: @.***>

AndrewBenjamin-NOAA commented 2 months ago

@climbfuji if the schedule to install on all R&D platforms is still Q3 2024, that should work for the UPP group. If that were to get pushed back, then we would probably need to do a test install.

AndrewBenjamin-NOAA commented 2 months ago

@Hang-Lei-NOAA Thanks for the explanation. Let us know when the test area is staged on Dogwood.

climbfuji commented 2 months ago

@AndrewBenjamin-NOAA The plan is to release spack-stack-1.8.0 end of August/beginning of September and then roll it out. That would mean the new packages will be on all systems in the first 1-2 weeks of September. Thus, if you are referring to calendar years and not fiscal years, that would fit. Nonetheless, I would encourage a test install earlier on one platform so that we know that things work - last thing we want is to redo entire spack-stack installs.

AndrewBenjamin-NOAA commented 2 months ago

I would encourage a test install earlier on one platform so that we know that things work - last thing we want is to redo entire spack-stack installs.

@climbfuji That makes sense and I agree. I think the best course is to go ahead with the test install on Hera for UPP testing. Is that something you can set up or will the UPP group need to stage the testing area?

climbfuji commented 2 months ago

We have a spack-stack meeting today- will get back with you after that. Thanks!

AlexanderRichert-NOAA commented 2 months ago

@WenMeng-NOAA are you sure you want this under 1.7.0 as opposed to 1.6.0?

WenMeng-NOAA commented 2 months ago

I would encourage a test install earlier on one platform so that we know that things work - last thing we want is to redo entire spack-stack installs.

@climbfuji That makes sense and I agree. I think the best course is to go ahead with the test install on Hera for UPP testing. Is that something you can set up or will the UPP group need to stage the testing area?

@climbfuji The test installation on Hera only is not sufficient for the UPP updates. That will break down the UPP support on other R&D platforms.

AndrewBenjamin-NOAA commented 2 months ago

@AlexanderRichert-NOAA, would you mind clarifying here?

@WenMeng-NOAA are you sure you want this under 1.7.0 as opposed to 1.6.0?

We are limiting the number of installations/modifications for existing spack-stack releases. Also, as far as I know the UFS applications are going to skip spack-stack-1.7.0 and go straight to 1.8.0 (from 1.6.0). Thus, we will include this update in the spack-stack-1.8.0 release, but not in the already installed 1.7.0 release.

We were under the impression that modification of an existing release is not possible.

Given UPP's need to support multiple R&D platforms, would the most likely solution be to have testing areas set up on all platforms UPP supports prior to 1.8.0's release?

AlexanderRichert-NOAA commented 2 months ago

@AndrewBenjamin-NOAA sure-- We can create "add-on" environments in each system on top of the unified environment (which is the piece that we don't want to go back and directly modify). So in this case, we could create another chained environment under whatever release is desired which would use g2@3.5.1 and g2tmpl@1.13.0 and rebuild their dependents, with the rest of the packages coming from that release's unified environment. So I'm assuming you'll want 1.6.0 since that's what UPP currently uses (though I don't know how big of a leap it would be to go to 1.7.0 in terms of how many UPP dependencies have changed versions).

WenMeng-NOAA commented 2 months ago

@AndrewBenjamin-NOAA sure-- We can create "add-on" environments in each system on top of the unified environment (which is the piece that we don't want to go back and directly modify). So in this case, we could create another chained environment under whatever release is desired which would use g2@3.5.1 and g2tmpl@1.13.0 and rebuild their dependents, with the rest of the packages coming from that release's unified environment. So I'm assuming you'll want 1.6.0 since that's what UPP currently uses (though I don't know how big of a leap it would be to go to 1.7.0 in terms of how many UPP dependencies have changed versions).

@AlexanderRichert-NOAA The installations at the "add-on" environment under 1.6.0 should work for the UPP standalone (offline post). Eventually when ufs-weather-model is updated to 1.8.0, I will update the upp submodule for inline post.

edwardhartnett commented 2 months ago

Once these new NCEPLIBS releases are installed, the UPP crew can test against them. If they find a problem, we repeat this whole process.

Imagine a world in which UPP runs unit tests on GitHub, and confirms that new releases of NCEPLIBS work. In that case, all this work would not be needed. Tests would have proceeded within minutes of the NCEPLIBS releases, without involving Hang, Alex, or Andrew. Tests would have run on a computer in Bill Gate's closet, instead of NOAA machines.

Only after everything had been thoroughly tested would we ask for an update on NOAA machines. We would be much less likely to need to fix something and install again. This would be a significant savings for NOAA, the NCEPLIBS team, and the UPP team. Bugs that currently take more than a week to find, could be found within minutes.

Hang-Lei-NOAA commented 2 months ago

@Wen Meng - NOAA Federal @Andrew Benjamin - NOAA Federal @.> I have added an testing sample for GDIT on dogwoods. You can try it on @.:~/save/forgdit/nco_wcoss2> module use /lfs/h2/emc/eib/save/Hang.Lei/forgdit/nco_wcoss2/install/modulefiles/compiler/intel/19.1.3.304 @.:~/save/forgdit/nco_wcoss2> module load g2tmpl/1.13.0 @.:~/save/forgdit/nco_wcoss2> module load g2/3.5.1 @.***:~/save/forgdit/nco_wcoss2> module show g2tmpl/1.13.0

/lfs/h2/emc/eib/save/Hang.Lei/forgdit/nco_wcoss2/install/modulefiles/compiler/intel/19.1.3.304/g2tmpl/1.13.0.lua:

help([[]]) conflict("g2tmpl") setenv("g2tmpl_ROOT","/lfs/h2/emc/eib/save/Hang.Lei/forgdit/nco_wcoss2/install/intel-19.1.3.304/g2tmpl/1.13.0") setenv("g2tmpl_VERSION","1.13.0") setenv("G2TMPL_INC","/lfs/h2/emc/eib/save/Hang.Lei/forgdit/nco_wcoss2/install/intel-19.1.3.304/g2tmpl/1.13.0/include") setenv("G2TMPL_LIB","/lfs/h2/emc/eib/save/Hang.Lei/forgdit/nco_wcoss2/install/intel-19.1.3.304/g2tmpl/1.13.0/lib/libg2tmpl.a") whatis("Name: g2tmpl") whatis("Version: 1.13.0") whatis("Category: library") whatis("Description: g2tmpl library")

@.***:~/save/forgdit/nco_wcoss2> module show g2/3.5.1

/lfs/h2/emc/eib/save/Hang.Lei/forgdit/nco_wcoss2/install/modulefiles/compiler/intel/19.1.3.304/g2/3.5.1.lua:

help([[]]) conflict("g2") setenv("g2_ROOT","/lfs/h2/emc/eib/save/Hang.Lei/forgdit/nco_wcoss2/install/intel-19.1.3.304/g2/3.5.1") setenv("g2_VERSION","3.5.1") setenv("G2_INC4","/lfs/h2/emc/eib/save/Hang.Lei/forgdit/nco_wcoss2/install/intel-19.1.3.304/g2/3.5.1/include_4") setenv("G2_INCd","/lfs/h2/emc/eib/save/Hang.Lei/forgdit/nco_wcoss2/install/intel-19.1.3.304/g2/3.5.1/include_d") setenv("G2_LIB4","/lfs/h2/emc/eib/save/Hang.Lei/forgdit/nco_wcoss2/install/intel-19.1.3.304/g2/3.5.1/lib64/libg2_4.a") setenv("G2_LIBd","/lfs/h2/emc/eib/save/Hang.Lei/forgdit/nco_wcoss2/install/intel-19.1.3.304/g2/3.5.1/lib64/libg2_d.a") whatis("Name: g2") whatis("Version: 3.5.1") whatis("Category: library") whatis("Description: g2 library")

On Tue, Jul 9, 2024 at 5:53 PM Edward Hartnett @.***> wrote:

Once these new NCEPLIBS releases are installed, the UPP crew can test against them. If they find a problem, we repeat this whole process.

Imagine a world in which UPP runs unit tests on GitHub, and confirms that new releases of NCEPLIBS work. In that case, all this work would not be needed. Tests would have proceeded within minutes of the NCEPLIBS releases, without involving Hang, Alex, or Andrew. Tests would have run on a computer in Bill Gate's closet, instead of NOAA machines.

Only after everything had been thoroughly tested would we ask for an update on NOAA machines. We would be much less likely to need to fix something and install again. This would be a significant savings for NOAA, the NCEPLIBS team, and the UPP team. Bugs that currently take more than a week to find, could be found within minutes.

— Reply to this email directly, view it on GitHub https://github.com/JCSDA/spack-stack/issues/1180#issuecomment-2218794518, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFBPRYWB64HWCS4AWKTZLRLURAVCNFSM6AAAAABKQYNGGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJYG44TINJRHA . You are receiving this because you were mentioned.Message ID: @.***>

climbfuji commented 2 months ago

@Hang-Lei-NOAA and @WenMeng-NOAA Didn't we agree that WCOSS2 specific communications, as long as WCOSS2 is not using spack-stack, will happen in its own repository? It's making it harder for us to track what needs to be done for spack-stack on the other systems. Apologies if I misunderstood previous conversations.

WenMeng-NOAA commented 2 months ago

@climbfuji Sure. We will communicate with @Hang-Lei-NOAA for WCOSS2 testing for offline.

Hang-Lei-NOAA commented 2 months ago

Sorry Dom, will pay more attention on it.

On Wed, Jul 10, 2024 at 12:41 PM WenMeng-NOAA @.***> wrote:

@climbfuji https://github.com/climbfuji Sure. We will communicate with @Hang-Lei-NOAA https://github.com/Hang-Lei-NOAA for WCOSS2 testing for offline.

— Reply to this email directly, view it on GitHub https://github.com/JCSDA/spack-stack/issues/1180#issuecomment-2220993439, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFH2VNWEOKCGEGVXN7DZLVPZZAVCNFSM6AAAAABKQYNGGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRQHE4TGNBTHE . You are receiving this because you were mentioned.Message ID: @.***>

climbfuji commented 2 months ago

Sorry Dom, will pay more attention on it. On Wed, Jul 10, 2024 at 12:41 PM WenMeng-NOAA @.> wrote: @climbfuji https://github.com/climbfuji Sure. We will communicate with @Hang-Lei-NOAA https://github.com/Hang-Lei-NOAA for WCOSS2 testing for offline. — Reply to this email directly, view it on GitHub <#1180 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFH2VNWEOKCGEGVXN7DZLVPZZAVCNFSM6AAAAABKQYNGGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRQHE4TGNBTHE . You are receiving this because you were mentioned.Message ID: @.>

No problem! Just wanted to make sure whether I remembered correctly (sometimes, no, often, my memory is weak ...)

AndrewBenjamin-NOAA commented 1 month ago

Bringing this back to our attention: What is the status of installing "add-on" environments on the R&D machines for Wen to test UPP?

AlexanderRichert-NOAA commented 1 month ago

I've updated the upp-addon-env's under spack-stack-1.6.0 (per above discussion) on Hera, Jet, Gaea C5, Orion, and Hercules to include g2 3.4.5 and 3.5.1, and g2tmpl 1.12.0 and 1.13.0.

WenMeng-NOAA commented 1 month ago

I've updated the upp-addon-env's under spack-stack-1.6.0 (per above discussion) on Hera, Jet, Gaea C5, Orion, and Hercules to include g2 3.4.5 and 3.5.1, and g2tmpl 1.12.0 and 1.13.0.

@AlexanderRichert-NOAA I conducted the UPP test on Hera and confirmed the expected changes with g2/3.5.1 and g2tmpl/1.13.0. Could you also install g2 3.4.5 and 3.5.1, and g2tmpl 1.12.0 and 1.13.0 under the upp-addon-env's of spack-stack-1.6.0 on s4 and noaacloud? Thanks!

AlexanderRichert-NOAA commented 1 month ago

Those are both JCSDA platforms (as far as spack-stack maintenance goes), and I don't have access to either. @srherbener @RatkoVasic-NOAA @natalie-perlin would you be able to assist?

RatkoVasic-NOAA commented 1 month ago

We don't have access to S4. @natalie-perlin might do that on cloud when she's back from conference.

srherbener commented 1 month ago

@AlexanderRichert-NOAA, I can help with S4, but I'm not sure I fully understand what needs to be done. Would it work for me to simply replicate what was done on Orion on S4? Would I be looking for spack-stack-1.6.0 as the upstream environment, and "upp-addon-env" as the chained environment? It looks like only Intel compiler is supported on S4, so will that be sufficient to do only Intel? Thanks!

AlexanderRichert-NOAA commented 1 month ago

Thanks @srherbener. That's correct, only Intel, and yes, we're updating the existing upp-addon-env under spack-stack-1.6.0, which chains to the unified env. I made the installation probably a bit overly elaborate, but here's the idea:

srherbener commented 1 month ago

Thanks @AlexanderRichert-NOAA! I'll let you know if/when questions come up.

srherbener commented 1 month ago

@AlexanderRichert-NOAA I have updated S4 and I can see the new g2 and g2tmpl versions. I made it all the way through to the end of updating the chained environment (upp-addon-env) including the lmod refresh and the building of the setup-meta-modules steps.

I think S4 is done, but it would be great if someone who knows what the new environment should look like to test my updates. Thanks!

jkbk2004 commented 1 month ago

@RatkoVasic-NOAA can you make sure that g2tmpl-1.13.0/g2-3.5.1 is available on Derecho with spack 1.6.0? New version of g2tmpl-1.13.0 version will be available on WCOSS2 anytime this week. We may update UPP directly with g2tmpl-1.13.0 for https://github.com/ufs-community/ufs-weather-model/pull/2326.

InnocentSouopgui-NOAA commented 1 month ago

@AlexanderRichert-NOAA I have updated S4 and I can see the new g2 and g2tmpl versions. I made it all the way through to the end of updating the chained environment (upp-addon-env) including the lmod refresh and the building of the setup-meta-modules steps.

I think S4 is done, but it would be great if someone who knows what the new environment should look like to test my updates. Thanks!

@srherbener I am testing a PR ufs-community/ufs-weather-model#2326 on S4. It uses this upp-addon-env that you installed. The compiling of ufs-weather-model is failing with the following error message

Could NOT find PIO (missing: C Fortran) (Required is at least version "2.5.3")

You can see error details in the pull request.

InnocentSouopgui-NOAA commented 1 month ago

@srherbener The issue on S4 is solved.

climbfuji commented 1 month ago

Yes, I fixed it

climbfuji commented 1 month ago

was about to test but you were faster

climbfuji commented 1 month ago

The issue on S4 was that the spack module lmod refresh command was run without --upstream-modules, which is required for chained environments.

I am going to close this issue as completed.

climbfuji commented 3 weeks ago

@RatkoVasic-NOAA @AlexanderRichert-NOAA I noticed that configs/common/packages.yaml still lists the old g2/g2tmpl versions. I thought 1.8.0 should use 3.5.1 and 1.13.0.

RatkoVasic-NOAA commented 3 weeks ago

@RatkoVasic-NOAA @AlexanderRichert-NOAA I noticed that configs/common/packages.yaml still lists the old g2/g2tmpl versions. I thought 1.8.0 should use 3.5.1 and 1.13.0.

@climbfuji Good catch, I'll open PR.