JCSDA / spack-stack

Creative Commons Zero v1.0 Universal
27 stars 44 forks source link

[INSTALL] new NCEPLIBS-bufr version 12.0.0 #626

Closed jbathegit closed 1 year ago

jbathegit commented 1 year ago

Which software in the stack would you like installed? A new version of the NCEPLIBS-bufr library is now available from https://github.com/NOAA-EMC/NCEPLIBS-bufr/releases/tag/bufr_v12.0.0

What is the version/tag of the software? v12.0.0

Which machines would you like to have the software installed? This library needs to be installed on WCOSS2 (cactus and dogwood) as well as various HPC machines including Jet, Hera, and possibly others where there's already an installation of an earlier version of the library. This is not an emergency release, so if there are still earlier versions of the library available on any platform, please leave them there in order to give users time to transition on their own schedule.

Any other relevant information that we should know to correctly install the software?? Please see https://github.com/NOAA-EMC/NCEPLIBS-bufr/blob/develop/README.md. The -DMASTER_TABLE_DIR option does not need to be specified (you can just let it default to path1), but the -DENABLE_PYTHON=ON should be included for any machines on which it is currently available for earlier versions of the library.

Additional context Full release notes are available from https://noaa-emc.github.io/NCEPLIBS-bufr/md__home_runner_work_NCEPLIBS_bufr_NCEPLIBS_bufr_bufr_docs_ReleaseNotes.html

jbathegit commented 1 year ago

Also, when this is released on WCOSS2, a new bufr/12.0.0 module will need to be installed on that system as well.

Hang-Lei-NOAA commented 1 year ago

Okay will process it.

AlexanderRichert-NOAA commented 1 year ago

I'm looking at the Spack recipe. Currently, it has options for setting static vs. shared, whether to build Python interface, and whether to enable tests. I think I already know the answer, but, is there anything in the recipe that needs to change for this release outside of just adding the new version?

jbathegit commented 1 year ago

I can't think of anything that needs to be different from previous builds of this library.

climbfuji commented 1 year ago

This is addressed by #630

jbathegit commented 1 year ago

Why is this being closed? For one thing, I don't see the new bufr/12.0.0 library or module available yet on WCOSS2.

climbfuji commented 1 year ago

bufr@12 will be rolled out in the next days or weeks with spack-stack-1.4.1 (it's been added to spack and spack-stack, therefore the issue was closed). If you are adamant keeping the issue open until it's actually on the system, it's ok to reopen.

jbathegit commented 1 year ago

I still don't see bufr/12.0.0 available on WCOSS2 (cactus/dogwood/acorn) for use in NCEP production. What's the expected timeframe for this to happen?

jbathegit commented 1 year ago

FWIW, I would like to re-open this issue to keep attention on it until it's completed. But I don't see any way to do that other than just starting a completely new issue and then referencing this issue in the new one.

GeorgeVandenberghe-NOAA commented 1 year ago

I agree with Jeff, an install issue should not be closed until the libraries have actually been installed on the listed systems. Management should be pinged on the continued open and unresolved state of this issue, and why it remains open even when it is slow WCOSS2 install that is holding it up. Nothing the Libraries team is doing is slowing it down.. it is off our desks.

AlexanderRichert-NOAA commented 1 year ago

Agree, reopening issue. bufr@12.0.0 is available on Acorn under /lfs/h1/emc/nceplibs/noscrub/spack-stack/spack-stack-1.4.1/envs/ufs-bufr12 which has the various ufs packages from the main environment, plus bufr 12.0.0. bufr 12.0.0 will also be in spack-stack-1.5.0, which will be ready to go on Acorn either today or tomorrow.

@Hang-Lei-NOAA what's the status of bufr 12 on dogwood/cactus?

jbathegit commented 1 year ago

Thanks @AlexanderRichert-NOAA, and just to clarify - will the latest spack-stack also include the new madis/4.5 which has also already been built and tested on Acorn?

You probably recall I had tested out your builds of both bufr/12.0.0 and madis/4.5 a couple of weeks ago on Acorn. We really need both libraries installed on cactus and dogwood as soon as possible.

AlexanderRichert-NOAA commented 1 year ago

Yes, we added madis 4.5 to the unified environment for spack-stack 1.5.0, so that will be there as well.

Parenthetically, I think madis is always a static library anyway, but as far as our static vs. shared discussion for bufr, I was thinking we could create a mini-environment for decoders that would just be static bufr and madis (that would be on acorn at least, then I could raise the idea with NCO for cactus/dogwood).

jbathegit commented 1 year ago

Thanks @AlexanderRichert-NOAA, but FWIW I think there are probably a lot of other cactus/dogwood users who would also prefer to continue with static builds for bufr.

The decoders may well be one of the only (or only?) application which uses madis. But bufr is another animal entirely and is widely used by a lot of operational WCOSS2 codes.

climbfuji commented 1 year ago

@jbathegit @AlexanderRichert-NOAA bufr@12.0.0 is installed on all RDHPCS systems and on Acorn as part of spack-stack-1.5.0. Do you want to keep this issue open until the package finds its way onto WCOSS2, or is this sufficient for a resolution?

climbfuji commented 1 year ago

@AlexanderRichert-NOAA Related to that we should discuss what "completed" means for spack-stack. We can't control when GDIT installs code on WCOSS2.

GeorgeVandenberghe-NOAA commented 1 year ago

It it installs on a platform architecturally and environmentally the same as WCOSS2, that should be sufficient to close. GDIT can enter an issue if they ever encounter installation problems. Perhaps we should issue a prominent reminder that this release and anything else not installed on WCOSS2 after installation and testing on acorn is NOT SUPPORTED ON WCOSS2 by policy and remove that reminder when and if it is ever installed. Senior management needs to be aware of NOAA forecasting applications and their dependencies not supported on wcoss2 and that it is not a library issue but policy.

On Fri, Sep 29, 2023 at 9:53 AM Dom Heinzeller @.***> wrote:

@AlexanderRichert-NOAA https://github.com/AlexanderRichert-NOAA Related to that we should discuss what "completed" means for spack-stack. We can't control when GDIT installs code on WCOSS2.

— Reply to this email directly, view it on GitHub https://github.com/JCSDA/spack-stack/issues/626#issuecomment-1740933586, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FSTUCMYTWNBRXYJVVDX43HFRANCNFSM6AAAAAAZAZBWWE . You are receiving this because you commented.Message ID: @.***>

--

George W Vandenberghe

Lynker Technologies at NOAA/NWS/NCEP/EMC

5830 University Research Ct., Rm. 2141

College Park, MD 20740

@.***

301-683-3769(work) 3017751547(cell)

jbathegit commented 1 year ago

I understand that this may be of out of our hands at this point. But this is also the first I've heard that the delay may be on GDIT's end (or at least that's what I think I'm hearing now(?), and then I guess by extension on NCO's end since I think that's who GDIT gets their marching orders from.

Either way, and from where I sit, this update really needs to be on Cactus and Dogwood before I'll consider this issue closed, because that's where I and many other users will access it. I admit I'm fuzzy on all of the steps an update needs to go through to get from being requested in a spack-stack issue until it ends up on WCOSS2. But that's what I was told to do whenever I released a new update version of the library, and it's a bit frustrating to me that I made the code release and opened this issue nearly 4 months ago, and yet I still don't see it installed on WCOSS2. Yes it's been built on Acorn by Alex, and I already tested and OK'ed that build a month ago, so what's the continued holdup? Is there still a debate raging about whether it can be installed as a static build? Or is this just a resourcing issue (or lack thereof)?

Bottom line: Whatever the process is definitely seems to have broken down somewhere, so whose ear do I now need to go and be a bug in to push this forward?

AlexanderRichert-NOAA commented 1 year ago

If @Hang-Lei-NOAA has submitted the request to NCO (which I believe is the case), then it just comes down to NCO priorities and what they tell GDIT to do when.

climbfuji commented 1 year ago

My question is how far the spack-stack repo should be impacted by the timelines of NCO/GDIT. We've done our part and it feels that there should be another place to track installs on systems that we don't have control over.

GeorgeVandenberghe-NOAA commented 1 year ago

A statement that this implemented update has been tested and runs on acorn and other NOAA systems would warrant closing it on spack-stack. We need to make it clear to NOAA senior management that this and many other implementations, fixes, and enhancements and any modeling systems that depend on them are UNSUPPORTED ON WCOSS2 until and unless they are implemented there

On Fri, Sep 29, 2023 at 4:25 PM Jeff Ator @.***> wrote:

I understand that this may be of out of our hands at this point. But this is also the first I've heard that the delay may be on GDIT's end (or at least that's what I think I'm hearing now(?), and then I guess by extension on NCO's end since I think that's who GDIT gets their marching orders from.

Either way, and from where I sit, this update really needs to be on Cactus and Dogwood before I'll consider this issue closed, because that's where I and many other users will access it. I admit I'm fuzzy on all of the steps an update needs to go through to get from being requested in a spack-stack issue until it ends up on WCOSS2. But that's what I was told to do whenever I released a new update version of the library, and it's a bit frustrating to me that I made the code release and opened this issue nearly 4 months ago, and yet I still don't see it installed on WCOSS2. Yes it's been built on Acorn by Alex, and I already tested and OK'ed that build a month ago, so what's the continued holdup? Is there still a debate raging about whether it can be installed as a static build? Or is this just a resourcing issue (or lack thereof)?

Bottom line: Whatever the process is definitely seems to have broken down somewhere, so whose ear do I now need to go and be a bug in to push this forward?

— Reply to this email directly, view it on GitHub https://github.com/JCSDA/spack-stack/issues/626#issuecomment-1741173257, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FRVNUT2M3TZY5TKXKDX43Y6LANCNFSM6AAAAAAZAZBWWE . You are receiving this because you commented.Message ID: @.***>

--

George W Vandenberghe

Lynker Technologies at NOAA/NWS/NCEP/EMC

5830 University Research Ct., Rm. 2141

College Park, MD 20740

@.***

301-683-3769(work) 3017751547(cell)

Hang-Lei-NOAA commented 1 year ago

Including the bufr/12.0.0, there are 10 libs sent to NCo/GDIT for installation. They has not started to do them. I checked last week with Steven. He said that will assign someone to process.

On Fri, Sep 29, 2023 at 1:41 PM GeorgeVandenberghe-NOAA < @.***> wrote:

A statement that this implemented update has been tested and runs on acorn and other NOAA systems would warrant closing it on spack-stack. We need to make it clear to NOAA senior management that this and many other implementations, fixes, and enhancements and any modeling systems that depend on them are UNSUPPORTED ON WCOSS2 until and unless they are implemented there

On Fri, Sep 29, 2023 at 4:25 PM Jeff Ator @.***> wrote:

I understand that this may be of out of our hands at this point. But this is also the first I've heard that the delay may be on GDIT's end (or at least that's what I think I'm hearing now(?), and then I guess by extension on NCO's end since I think that's who GDIT gets their marching orders from.

Either way, and from where I sit, this update really needs to be on Cactus and Dogwood before I'll consider this issue closed, because that's where I and many other users will access it. I admit I'm fuzzy on all of the steps an update needs to go through to get from being requested in a spack-stack issue until it ends up on WCOSS2. But that's what I was told to do whenever I released a new update version of the library, and it's a bit frustrating to me that I made the code release and opened this issue nearly 4 months ago, and yet I still don't see it installed on WCOSS2. Yes it's been built on Acorn by Alex, and I already tested and OK'ed that build a month ago, so what's the continued holdup? Is there still a debate raging about whether it can be installed as a static build? Or is this just a resourcing issue (or lack thereof)?

Bottom line: Whatever the process is definitely seems to have broken down somewhere, so whose ear do I now need to go and be a bug in to push this forward?

— Reply to this email directly, view it on GitHub https://github.com/JCSDA/spack-stack/issues/626#issuecomment-1741173257,

or unsubscribe < https://github.com/notifications/unsubscribe-auth/ANDS4FRVNUT2M3TZY5TKXKDX43Y6LANCNFSM6AAAAAAZAZBWWE>

. You are receiving this because you commented.Message ID: @.***>

--

George W Vandenberghe

Lynker Technologies at NOAA/NWS/NCEP/EMC

5830 University Research Ct., Rm. 2141

College Park, MD 20740

@.***

301-683-3769(work) 3017751547(cell)

— Reply to this email directly, view it on GitHub https://github.com/JCSDA/spack-stack/issues/626#issuecomment-1741269329, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFBCDQ3EEYYKHZLJUVDX44B2ZANCNFSM6AAAAAAZAZBWWE . You are receiving this because you were mentioned.Message ID: @.***>

climbfuji commented 1 year ago

I agree with @GeorgeVandenberghe-NOAA. I think the spack-stack responsibilities end with getting this onto Acorn, at which point it should be considered completed.

jbathegit commented 1 year ago

@Hang-Lei-NOAA wrote the following in https://github.com/NOAA-EMC/NCEPLIBS-bufr/issues/517:

The buf2/12.0.0 has been on wcoss2 both dogwoods and cactus /apps/ops/para/libs/modulefiles/compiler/intel/19.1.3.304/bufr/12.0.0.lua Please load as hpc-stack way to check. Then they will install as NCO way.

jbathegit commented 1 year ago

@Hang-Lei-NOAA, I already extensively tested the 12.0.0 build on Acorn that @AlexanderRichert-NOAA generated back in August. How is this build different than that one? I'm asking b/c I don't want to repeat a lot of work if I don't have to.

jbathegit commented 1 year ago

FWIW, and taking an early peek at this, I see the following after loading in the new module :

jeff.ator@dlogin05 $ module show bufr/12.0.0
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   /apps/ops/para/libs/modulefiles/compiler/intel/19.1.3.304/bufr/12.0.0.lua:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
help([[]])
conflict("bufr")
setenv("bufr_ROOT","/apps/ops/prod/libs/intel/19.1.3.304/bufr/12.0.0")
setenv("bufr_VERSION","12.0.0")
setenv("BUFR_INC4","/apps/ops/prod/libs/intel/19.1.3.304/bufr/12.0.0/lib/include_4")
setenv("BUFR_INC8","/apps/ops/prod/libs/intel/19.1.3.304/bufr/12.0.0/lib/include_8")
setenv("BUFR_INCd","/apps/ops/prod/libs/intel/19.1.3.304/bufr/12.0.0/lib/include_d")
setenv("BUFR_LIB4","/apps/ops/prod/libs/intel/19.1.3.304/bufr/12.0.0/lib/libbufr_4.a")
setenv("BUFR_LIB8","/apps/ops/prod/libs/intel/19.1.3.304/bufr/12.0.0/lib/libbufr_8.a")
setenv("BUFR_LIBd","/apps/ops/prod/libs/intel/19.1.3.304/bufr/12.0.0/lib/libbufr_d.a")
prepend_path("PATH","/apps/ops/prod/libs/intel/19.1.3.304/bufr/12.0.0/bin")
whatis("Name: bufr")
whatis("Version: 12.0.0")
whatis("Category: library")
whatis("Description: bufr library")

But those paths aren't correct for the following reasons:

  1. prod should really be para in all of the above envvars at the moment, but that's OK and I could temporarily work around that; however and more importantly in v12.0.0+:
  2. lib should be lib64 instead (the former subdirectory no longer exists)
  3. There's no longer a _8 or _d build. Instead, from now on there's only a libbufr4.a in the lib64 subdirectory, so the above envvars for BUFR(LIB|INC)[8d] need to be removed from the modulefile.
  4. BUFR_INC4 should now be /apps/ops/para/libs/intel/19.1.3.304/bufr/12.0.0/include/bufr_4, b/c that pathname has now also changed.
Hang-Lei-NOAA commented 1 year ago

This version is using the hpc-stack to deliver it. It should also directly use your cmake script in your tag to build.

On Wed, Oct 4, 2023 at 11:20 AM Jeff Ator @.***> wrote:

@Hang-Lei-NOAA https://github.com/Hang-Lei-NOAA, I already extensively tested the 12.0.0 build on Acorn that @AlexanderRichert-NOAA https://github.com/AlexanderRichert-NOAA generated back in August. How is this build different than that one? I'm asking b/c I don't want to repeat a lot of work if I don't have to.

— Reply to this email directly, view it on GitHub https://github.com/JCSDA/spack-stack/issues/626#issuecomment-1747078030, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFBKZDDC2DI3DIGKRH3X5V5DHAVCNFSM6AAAAAAZAZBWWGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBXGA3TQMBTGA . You are receiving this because you were mentioned.Message ID: @.***>

jbathegit commented 1 year ago

Hi @Hang-Lei-NOAA sorry but I don't understand what you mean by "use your cmake script in your tag to build"? Are you talking about CMakeLists.txt in the main package directory? If you're using that then you should only ever end up with the correct lib64 and include subdirectories, and with only the correct _4 build variants. Or are you asking about something else?

It looks to me like the /apps/ops/para/libs/modulefiles/compiler/intel/19.1.3.304/bufr/12.0.0.lua file is just based on a previous iteration of that same file from an earlier library version. But again, some of those items have now changed in v12.0.0+. So who or what is generating that module file? It's not part of the NCEPLIBS-bufr package.

jbathegit commented 1 year ago

In case it helps, it looks like the directory was created a couple of days ago:

jeff.ator@dlogin05 $ ls -la /apps/ops/para/libs/modulefiles/compiler/intel/19.1.3.304/bufr/
total 52
drwxr-sr-x  2 ops.para para 16384 Oct  2 17:01 ./
drwxr-sr-x 25 ops.para para 16384 Oct  2 17:01 ../
-rw-r--r--  1 ops.para para  1254 Oct  2 17:01 11.4.0.lua
-rw-r--r--  1 ops.para para  1405 Oct  2 17:01 11.5.0.lua
-rw-r--r--  1 ops.para para  1304 Oct  2 17:01 11.6.0.lua
-rw-rw-r--  1 ops.para para  1613 Oct  2 17:01 11.7.0.lua
-rw-rw-r--  1 ops.para para  1610 Oct  2 17:01 12.0.0.lua
lrwxrwxrwx  1 ops.para para    10 Sep 28 17:55 default -> 12.0.0.lua
jeff.ator@dlogin05 $

So was it NCO who generated that new 12.0.0.lua module file? If so, can you tell me who you're working with at NCO and maybe I can reach out to them directly to get this straightened out?

We also don't want the new v12.0.0 to be the default version right away, b/c this is a major release and therefore some application codes may need some modifications before they can start using it. Instead, we want the previous v11.7.0 to be the new default, and then applications can just override that by specifying the new v12.0.0 on their terms and when they're ready to do so.

Hang-Lei-NOAA commented 1 year ago

@Jeff Ator - NOAA Federal @.***> The modulefile is set up by NCO. It is not using our default one as on Acorn. The cactus one was pulled from dogwoods. I will forward the email to you for your direct communication with the NCO installer.

On Wed, Oct 4, 2023 at 11:57 AM Jeff Ator @.***> wrote:

Hi @Hang-Lei-NOAA https://github.com/Hang-Lei-NOAA sorry but I don't understand what you mean by "use your cmake script in your tag to build"? Are you talking about CMakeLists.txt in the main package directory? If you're using that then you should only ever end up with the correct lib64 and include subdirectories, and with only the correct _4 build variants. Or are you asking about something else?

It looks to me like the /apps/ops/para/libs/modulefiles/compiler/intel/19.1.3.304/bufr/12.0.0.lua file is just based on a previous iteration of that same file from an earlier library version. But again, some of those items have now changed in v12.0.0+. So who or what is generating that module file? It's not part of the NCEPLIBS-bufr package.

— Reply to this email directly, view it on GitHub https://github.com/JCSDA/spack-stack/issues/626#issuecomment-1747177312, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFHFCBVA4CMFLRV2RT3X5WBPNAVCNFSM6AAAAAAZAZBWWGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBXGE3TOMZRGI . You are receiving this because you were mentioned.Message ID: @.***>

climbfuji commented 1 year ago

This has been installed on WCOSS2, closing.