JCSDA / spack-stack

Creative Commons Zero v1.0 Universal
27 stars 46 forks source link

spack-stack installations do not have `prod_util` #780

Closed aerorahul closed 1 year ago

aerorahul commented 1 year ago

Describe the bug The global-workflow is updating the ufs-weather-model hash to dd41cc6. We rely on the model repository to provide the modules used in the compilation and execution of the model. In addition, the workflow needs a utilities from a module called prod_util. It seems on Orion (at least), that after the ufs-weather-model modules are loaded, prod_util is not an available module in the software stack.

To Reproduce On Orion (at least)

501 ~ ❯❯❯ module purge
502 ~ ❯❯❯ module list
No modules loaded
503 ~ ❯❯❯ module use /work2/noaa/stmp/GFS_CI_ROOT/PR/1862/global-workflow/sorc/ufs_model.fd/modulefiles
504 ~ ❯❯❯ module avail

--------------------------------- /work2/noaa/stmp/GFS_CI_ROOT/PR/1862/global-workflow/sorc/ufs_model.fd/modulefiles ---------------------------------
   ufs_acorn.intel       ufs_common           ufs_hera.gnu      ufs_noaacloud.intel    ufs_s4.intel
   ufs_cheyenne.gnu      ufs_expanse.intel    ufs_hera.intel    ufs_odin               ufs_stampede.intel
   ufs_cheyenne.intel    ufs_gaea.intel       ufs_jet.intel     ufs_orion.intel        ufs_wcoss2.intel

--------------------------------------------------------------- /apps/modulefiles/core ---------------------------------------------------------------
   advisor/2019.5         fftw/3.3.8                  impi/2021.2                  motif/2.3.4                  qchem/5.3.0
...
...

505 ~ ❯❯❯ module load ufs_orion.intel
506 ~ ❯❯❯ module list

Currently Loaded Modules:
  1) intel/2022.1.2                    9) jasper/2.0.32      17) netcdf-fortran/4.6.0    25) crtm/2.4.0         33) ecbuild/3.7.2
  2) stack-intel/2022.0.2             10) zlib/1.2.13        18) parallel-netcdf/1.12.2  26) g2/3.4.5           34) yafyaml/0.5.1
  3) impi/2022.1.2                    11) libpng/1.6.37      19) parallelio/2.5.10       27) g2tmpl/1.10.2      35) mapl/2.35.2-esmf-8.4.2
  4) stack-intel-oneapi-mpi/2021.5.1  12) pkg-config/0.27.1  20) esmf/8.4.2              28) ip/3.3.3           36) scotch/7.0.3
  5) miniconda/3.9.7                  13) hdf5/1.14.0        21) fms/2023.01             29) sp/2.3.3           37) ufs_common
  6) stack-python/3.9.7               14) curl/8.0.1         22) bacio/2.4.1             30) w3emc/2.9.2        38) ufs_orion.intel
  7) cmake/3.23.1                     15) zstd/1.5.2         23) crtm-fix/2.4.0_emc      31) gftl/1.8.3
  8) libjpeg/2.1.0                    16) netcdf-c/4.9.2     24) git-lfs/2.12.0          32) gftl-shared/1.5.0

507 ~ ❯❯❯ module load prod_util
Lmod has detected the following error:  The following module(s) are unknown: "prod_util"

Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
  $ module --ignore-cache load "prod_util"

Also make sure that all modulefiles written in TCL start with the string #%Module

Steps to reproduce the behavior: See above.

Expected behavior prod_util should be loaded.

System: Seen on Orion. Possibly others.

Additional context prod_util is an integral module used in NCEP applications and should be included in every stack deployment.

======================================== Updates to existing spack-stack-1.5.0 deployments: Instructions:

climbfuji commented 1 year ago

On my macOS the module is called

prod-util/1.2.2

in spack-stack 1.5.0 rc1, spack-stack 1.4.1 and 1.4.0

On Sep 15, 2023, at 1:38 PM, Rahul Mahajan @.***> wrote:

Describe the bug The global-workflow is updating the ufs-weather-model hash to dd41cc6 https://github.com/ufs-community/ufs-weather-model/commit/dd41cc61e5e52e2cfeb00af8930e7496a5fb91fd. We rely on the model repository to provide the modules used in the compilation and execution of the model. In addition, the workflow needs a utilities from a module called prod_util. It seems on Orion (at least), that after the ufs-weather-model modules are loaded, prod_util is not an available module in the software stack.

To Reproduce On Orion (at least)

501 ~ ❯❯❯ module purge 502 ~ ❯❯❯ module list No modules loaded 503 ~ ❯❯❯ module use /work2/noaa/stmp/GFS_CI_ROOT/PR/1862/global-workflow/sorc/ufs_model.fd/modulefiles 504 ~ ❯❯❯ module avail

--------------------------------- /work2/noaa/stmp/GFS_CI_ROOT/PR/1862/global-workflow/sorc/ufs_model.fd/modulefiles --------------------------------- ufs_acorn.intel ufs_common ufs_hera.gnu ufs_noaacloud.intel ufs_s4.intel ufs_cheyenne.gnu ufs_expanse.intel ufs_hera.intel ufs_odin ufs_stampede.intel ufs_cheyenne.intel ufs_gaea.intel ufs_jet.intel ufs_orion.intel ufs_wcoss2.intel

--------------------------------------------------------------- /apps/modulefiles/core --------------------------------------------------------------- advisor/2019.5 fftw/3.3.8 impi/2021.2 motif/2.3.4 qchem/5.3.0 ... ...

505 ~ ❯❯❯ module load ufs_orion.intel 506 ~ ❯❯❯ module list

Currently Loaded Modules: 1) intel/2022.1.2 9) jasper/2.0.32 17) netcdf-fortran/4.6.0 25) crtm/2.4.0 33) ecbuild/3.7.2 2) stack-intel/2022.0.2 10) zlib/1.2.13 18) parallel-netcdf/1.12.2 26) g2/3.4.5 34) yafyaml/0.5.1 3) impi/2022.1.2 11) libpng/1.6.37 19) parallelio/2.5.10 27) g2tmpl/1.10.2 35) mapl/2.35.2-esmf-8.4.2 4) stack-intel-oneapi-mpi/2021.5.1 12) pkg-config/0.27.1 20) esmf/8.4.2 28) ip/3.3.3 36) scotch/7.0.3 5) miniconda/3.9.7 13) hdf5/1.14.0 21) fms/2023.01 29) sp/2.3.3 37) ufs_common 6) stack-python/3.9.7 14) curl/8.0.1 22) bacio/2.4.1 30) w3emc/2.9.2 38) ufs_orion.intel 7) cmake/3.23.1 15) zstd/1.5.2 23) crtm-fix/2.4.0_emc 31) gftl/1.8.3 8) libjpeg/2.1.0 16) netcdf-c/4.9.2 24) git-lfs/2.12.0 32) gftl-shared/1.5.0

507 ~ ❯❯❯ module load prod_util Lmod has detected the following error: The following module(s) are unknown: "prod_util"

Please check the spelling or version number. Also try "module spider ..." It is also possible your cache file is out-of-date; it may help to try: $ module --ignore-cache load "prod_util"

Also make sure that all modulefiles written in TCL start with the string #%Module Steps to reproduce the behavior: See above.

Expected behavior prod_util should be loaded.

System: Seen on Orion. Possibly others.

Additional context prod_util is an integral module used in NCEP applications and should be included in every stack deployment.

— Reply to this email directly, view it on GitHub https://github.com/JCSDA/spack-stack/issues/780, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5C2ROSHOQ7RBNMRUDFN4DX2SVDPANCNFSM6AAAAAA42IAP7U. You are receiving this because you are subscribed to this thread.

aerorahul commented 1 year ago

I tried that as well. There is no module named prod-util either to be found on Orion.

aerorahul commented 1 year ago

Here is the full module avail:

509 ~ ❯❯❯ module avail

------ /work/noaa/epic/role-epic/spack-stack/spack-stack-1.4.1/envs/ufs-pio-2.5.10/install/modulefiles/intel-oneapi-mpi/2021.5.1/intel/2022.0.2 ------
   base-env/1.0.0          fms/2023.01            (L,D)    nccmp/1.9.0.1        (D)    parallel-netcdf/1.12.2 (L)    scotch/7.0.3                (L)
   crtm/2.4.0     (L,D)    hdf5/1.14.0            (L,D)    netcdf-c/4.9.2       (L)    parallelio/2.5.10      (L)    ufs-pyenv/1.0.0
   esmf/8.4.2     (L,D)    mapl/2.35.2-esmf-8.4.2 (L,D)    netcdf-fortran/4.6.0 (L)    py-netcdf4/1.5.3              ufs-weather-model-env/1.0.0

------------------------------------------------- /apps/modulefiles/mpi/intel-2022.1.2/impi-2022.1.2 -------------------------------------------------
   crtm/2.4.0    esmf/8.3.0        fms/2022.04    mapl/2.22.0-esmf-8.3.0        nwchem/7.0.0      pio/2.5.7             quantumespresso/6.6.0
   esmf/8.1.1    fftw/3.3.8 (D)    hdf5/1.10.6    netcdf/4.7.4           (D)    pflogger/1.5.0    pnetcdf/1.12.1 (D)    su2/7.1.1

------------------- /work/noaa/epic/role-epic/spack-stack/spack-stack-1.4.1/envs/ufs-pio-2.5.10/install/modulefiles/intel/2022.0.2 -------------------
   bacio/2.4.1        (L,D)    jasper/2.0.32       (L,D)    py-numexpr/2.8.3              py-versioneer/0.28
   cmake/3.23.1       (L,D)    libjpeg/2.1.0       (L)      py-numpy/1.22.3               py-wheel/0.37.1
   crtm-fix/2.4.0_emc (L)      libpng/1.6.37       (L,D)    py-packaging/23.0             sp/2.3.3                        (L,D)
   curl/8.0.1         (L)      libyaml/0.2.5                py-pandas/1.4.0               stack-intel-oneapi-mpi/2021.5.1 (L)
   ecbuild/3.7.2      (L)      openblas/0.3.19     (D)      py-pip/21.2.4                 stack-python/3.9.7              (L)
   g2/3.4.5           (L,D)    pkg-config/0.27.1   (L)      py-python-dateutil/2.8.2      w3emc/2.9.2                     (L,D)
   g2tmpl/1.10.2      (L,D)    py-bottleneck/1.3.5          py-pytz/2022.2.1              wget/1.14
   gftl-shared/1.5.0  (L,D)    py-cftime/1.0.3.4            py-pyyaml/6.0                 yafyaml/0.5.1                   (L,D)
   gftl/1.8.3         (L)      py-cython/0.29.33            py-setuptools-scm/7.0.5       zlib/1.2.13                     (L,D)
   git-lfs/2.12.0     (L)      py-f90nml/1.4.3              py-setuptools/59.4.0          zstd/1.5.2                      (L)
   git/1.8.3.1                 py-flit-core/3.7.1           py-six/1.16.0
   gmake/3.82                  py-jinja2/3.1.2              py-tomli/2.0.1
   ip/3.3.3           (L,D)    py-markupsafe/2.1.1          py-typing-extensions/4.5.0

----------------------------------------------------- /apps/modulefiles/compilers/intel-2022.1.2 -----------------------------------------------------
   bacio/2.4.1           flap/1.10.0      gftl-shared/1.5.0        ip/3.3.3         mpich/3.3.2 (D)    netcdf/4.7.4         w3emc/2.9.2
   cdo/1.9.10     (D)    g2/3.4.5         hdf5/1.10.6              jasper/2.0.25    nccmp/1.8.7        openmpi/4.0.4 (D)    wgrib2/3.0.2  (D)
   eccodes/2.22.1        g2tmpl/1.10.2    impi/2022.1.2     (L)    libpng/1.6.37    nco/4.9.3   (D)    sp/2.3.3             yafyaml/0.5.1

--------------------------------------------------- /work/noaa/da/role-da/spack-stack/modulefiles ----------------------------------------------------
   ecflow/5.8.4    miniconda/3.9.7 (L)    mysql/8.0.31

------------------------ /work/noaa/epic/role-epic/spack-stack/spack-stack-1.4.1/envs/ufs-pio-2.5.10/install/modulefiles/Core ------------------------
   stack-gcc/10.2.0    stack-intel/2022.0.2 (L)

--------------------------------- /work2/noaa/stmp/GFS_CI_ROOT/PR/1862/global-workflow/sorc/ufs_model.fd/modulefiles ---------------------------------
   ufs_acorn.intel       ufs_common        (L)    ufs_hera.gnu      ufs_noaacloud.intel        ufs_s4.intel
   ufs_cheyenne.gnu      ufs_expanse.intel        ufs_hera.intel    ufs_odin                   ufs_stampede.intel
   ufs_cheyenne.intel    ufs_gaea.intel           ufs_jet.intel     ufs_orion.intel     (L)    ufs_wcoss2.intel

--------------------------------------------------------------- /apps/modulefiles/core ---------------------------------------------------------------
   advisor/2019.5         fftw/3.3.8                  impi/2021.2                  motif/2.3.4                  qchem/5.3.0
   advisor/2020.2  (D)    fftw3/3.3.8                 impi/2022.1.2                mpich/3.3.1                  qchem/5.4.1       (D)
   ansys/2019.2           g2lib/3.1.0                 inspector/2018.4             mpich/3.3.2                  qt/5.12.1
   ansys/2021.1           gcc/8.3.0-new               inspector/2019.5             munge/0.5.13                 r/3.5.2
   antlr/2.7.7            gcc/8.3.0            (D)    inspector/2020               namd/2.13                    r/4.0.2
   armforge/22.0.2        gcc/10.2.0                  inspector/2020.2      (D)    nccmp/1.8.5                  r/4.2.0           (D)
   boost/1.70             gcc/11.3.0                  intel/2018.4                 nccmp/1.8.7                  rdesktop/2.2.0
   boost/1.78      (D)    gd/2.0.34                   intel/2019.5-new             ncl/6.6.2                    rstudio/1.2.5033
   bowtie2/2.4.1          gd/2.3.0             (D)    intel/2019.5                 nco/4.8.1                    seqkit/0.14.0
   cairo/1.17.2           gdal/3.1.2                  intel/2020                   nco/4.9.3                    simulia/2019
   camellia/2016          gdb/10.1                    intel/2020.2          (D)    ncview/2.1.5                 simulia/2020
   canu/2.1               gempak/7.5.1                intel/2021.2                 nedit/5.7                    singularity/3.8.3
   cdo/1.9.5              geos/3.8.1                  intel/2022.1.2        (L)    netcdf/4.7.2-parallel        slurm/19.05.3-2
   cdo/1.9.8              ghostview/3.7.4             intelpython2/2018.4          netcdf/4.7.2                 spack/0.18.1
   cdo/1.9.10             git/2.21.0                  intelpython2/2019.5   (D)    netcdf/4.7.4-parallel        sqlite/3.32.3
   chapel/1.18            git/2.28.0           (D)    intelpython3/2018.4          netcdf/4.7.4                 subversion/1.14.0
   chapel/1.22.1   (D)    gmt/6.1.1                   intelpython3/2019.5          numactl/2.0.14               szip/2.1.1
   cmake/3.15.4           gnuplot/5.2.7               intelpython3/2020            nvidia/2021-21.7             tassel/5.2.64
   cmake/3.17.3           go/1.17.5                   intelpython3/2020.2          openblas/0.3.10              texlive/2021
   cmake/3.18.1           gptl/8.0.3                  intelpython3/2022.1.2 (D)    openjdk/14.0.2               udunits/2.2.26
   cmake/3.22.1           grace/5.1.25                jasper/1.900.1               openjpeg/2.4.0               vbindiff/3.0b5
   cnvgrib/3.1.1          grads/2.2.1                 jellyfish/2.3.0              openmpi/4.0.2                vmd/1.9.3
   comsol/5.5             graphviz/2.44.1             julia/1.5.1                  openmpi/4.0.4                vtune/2018.4
   comsol/5.6             gromacs/2019.4              julia/1.8.5           (D)    ovito/2.9.0                  vtune/2019.5
   contrib/0.1            gsl/2.6                     llvm/7.1.0                   p4vasp/0.3.30                vtune/2020
   cubit/2021.5           hdf5/1.10.4                 local/0.1                    papi/6.0.0                   vtune/2020.2
   cuda/10.1.2     (D)    hdf5/1.10.5                 lstc                         paraview/5.7.0               vtune/2022.1.2    (D)
   cuda/11.0.3            hdf5/1.10.6-parallel        make/4.3                     paraview/5.8.0        (D)    w3lib/2.0.6
   cuda/11.2.1            hdf5/1.10.6                 maple/2020.0                 pcre2/10.40                  wget/1.21.3       (D)
   cudnn/8.1.1            htslib/1.10.2               matlab/2019b                 pdsh/2.34                    wgrib/1.8.0b
   cylc/7.9.1             hwloc/2.1.0          (D)    matlab/2020b                 perl/5.30.1                  wgrib/2.0.8       (D)
   eclipse/4.21           hwloc/2.2.0                 matlab/2021b                 perl/5.32.0           (D)    wgrib2/3.0.2
   eigen/3.3.7            hyperworks/2019.1           mauve/2.4.0                  pgi/2019              (D)    xalt/2.8.0        (S)
   envi/5.6.2             hyperworks/2021.2           maven/3.6.3                  pgi/2020-20.4                xcrysden/1.5.60
   esmf/8.0.0             idv/5.7                     mesa/19.2.2                  pnetcdf/1.12.0               xpdf/4.02
   exonerate/2.2.0        impi/2018.4                 miniconda/4.12.0      (D)    pnetcdf/1.12.1               xxdiff/3.2
   fastqc/0.11.9          impi/2019.6-new             mkl/2018.4                   povray/3.7.0                 zlib/1.2.11
   ferret/7.5.0           impi/2019.6                 mkl/2019.5                   proj/7.1.0
   ffmpeg/4.2.1           impi/2020                   mkl/2020                     python/3.7.5
   ffmpeg/4.3.1    (D)    impi/2020.2          (D)    mkl/2020.2            (D)    python/3.9.2          (D)

------------------------------------------------------------- /apps/licensed/modulefiles -------------------------------------------------------------
   ansys/2022.2           cst-studio/2022         harris/5.6.2             mathematica/13.1.0        scm/2014.08
   ansys/2023.2    (D)    cubit/2022.4     (D)    hyperworks/2022.1        matlab/2022b       (D)    simulia/2022     (D)
   comsol/6.0      (D)    fieldview/21            hyperworks/2022.2 (D)    metashape/1.8.4           tecplot/2022r1
   converge/3.0.26        gaussian/16-C.02        maple/2023.1      (D)    pointwise/22.1            thermocalc/2022b

  Where:
   S:  Module is Sticky, requires --force to unload or purge
   L:  Module is loaded
   D:  Default Module

Use "module spider" to find all possible modules and extensions.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".
AlexanderRichert-NOAA commented 1 year ago

Ah I see what happened, it's because when we added the chained environment with pio 2.5.10, we only spec'd ufs-weather-model-env, so prod-util isn't in there. I think the solution would be to add global-workflow-env, preferably in a new chained environment since at this point a fair number of people are using ufs-pio-2.5.10.

climbfuji commented 1 year ago

Given that spack-stack-1.5.0 is available on Orion, why not use that?

https://spack-stack.readthedocs.io/en/release-1.5.0/PreConfiguredSites.html#msu-orion

Everyone is going to transition to that in the next several weeks.

Note I am out until Tuesday morning starting in 2 minutes ;-)

AlexanderRichert-NOAA commented 1 year ago

I see what happened, it's because when we added the chained environment with pio 2.5.10, we only spec'd ufs-weather-model-env, so prod-util isn't in there. If 1.5.0 doesn't work then we can always create a copy of the ufs-pio-2.5.10 env that also includes global-workflow-env.

aerorahul commented 1 year ago

In order to use 1.5.0, we would have to ensure that ufs-weather-model is updated to use 1.5.0. Then we would have to use this updated hash of the ufs-weather-model for 1.5.0 in the global-workflow. This will involve updating the global-workflow for any updates in the model (this is done once per 2 months or so or when the modeling team has a need to use the updates in the workflow application). This might take some time. Is it very difficult to install the missing modules in the ufs-pio-2.5.10 stack?

AlexanderRichert-NOAA commented 1 year ago

Would it make a difference whether we added it to ufs-pio-2.5.10 vs. creating a new environment? Specifically, just to clarify, is it partly about wanting to use the same spack environment as UFS? We might be able to get prod-util into the existing environment just by copying over the module file, in which case I'd say it's a very low-risk maneuver (but I'd need to test that before saying anything for sure).

aerorahul commented 1 year ago

@AlexanderRichert-NOAA The desire and requirement here is to use the same spack environment as the UFS. Adding prod_util into the existing environment by copying the modulefile is fine so long as we don't have to do anything special like adding MODULEPATHS etc.

climbfuji commented 1 year ago

We expect to move the UFS to spack-stack-1.5.0 in the next couple of weeks, is that good enough?

AlexanderRichert-NOAA commented 1 year ago

I'll test copying the modulefile on the Acorn installation and if that works we can make the same change elsewhere. @climbfuji I think the idea is he wants to test against current UFS without having to also update a bunch of package versions.

AlexanderRichert-NOAA commented 1 year ago

Copying [unified-env]/install/modulefiles/[compiler]/[compiler version]/prod-util into the ufs-pio-2.5.10 modules on Acorn seems to work fine, the module loads and behaves as expected. @aerorahul would it be acceptable to make this change just on Orion to hold us over until the transition to spack-stack 1.5.0, or is there somewhere else you would also need this change sooner?

aerorahul commented 1 year ago

Hera and Orion.

AlexanderRichert-NOAA commented 1 year ago

@ulmononian can you take care of this on Orion? In spack-stack-1.4.1, under the ufs-pio-2.5.10 environment we created, we want to copy in the prod-util modulefiles from unified-env. @climbfuji can you do Hera?

ulmononian commented 1 year ago

@ulmononian can you take care of this on Orion? In spack-stack-1.4.1, under the ufs-pio-2.5.10 environment we created, we want to copy in the prod-util modulefiles from unified-env. @climbfuji can you do Hera?

done on orion.

ulmononian commented 1 year ago

however:

[role-epic@Orion:/work/noaa/epic/role-epic/spack-stack/spack-stack-1.4.1/envs/ufs-pio-2.5.10]$ module use /work/noaa/epic/role-epic/spack-stack/spack-stack-1.4.1/envs/ufs-pio-2.5.10/install/modulefiles/Core
[role-epic@Orion:/work/noaa/epic/role-epic/spack-stack/spack-stack-1.4.1/envs/ufs-pio-2.5.10]$ ml stack-intel
ml[role-epic@Orion:/work/noaa/epic/role-epic/spack-stack/spack-stack-1.4.1/envs/ufs-pio-2.5.10]$ ml stack-intel-oneapi-mpi
[role-epic@Orion:/work/noaa/epic/role-epic/spack-stack/spack-stack-1.4.1/envs/ufs-pio-2.5.10]$ module load prod-util
prod-util        prod-util/1.2.2
[role-epic@Orion:/work/noaa/epic/role-epic/spack-stack/spack-stack-1.4.1/envs/ufs-pio-2.5.10]$ module load prod-util
prod-util        prod-util/1.2.2
[role-epic@Orion:/work/noaa/epic/role-epic/spack-stack/spack-stack-1.4.1/envs/ufs-pio-2.5.10]$ module load prod-util
Lmod has detected the following error:  The following module(s) are unknown: "w3nco/2.4.1"

Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
  $ module --ignore-cache load "w3nco/2.4.1"

Also make sure that all modulefiles written in TCL start with the string #%Module

do we need to copy w3nco over as well?

AlexanderRichert-NOAA commented 1 year ago

Probably so, that's odd I didn't run into that on Acorn. In any case, please check the permissions, as I can't access the new prod-util modules.

ulmononian commented 1 year ago

Probably so, that's odd I didn't run into that on Acorn. In any case, please check the permissions, as I can't access the new prod-util modules.

i changed all the permissions, so please go ahead and try again.

AlexanderRichert-NOAA commented 1 year ago

The intel one still needs permissions updated. The gcc one loads fine and looks good.

AlexanderRichert-NOAA commented 1 year ago

Ok orion looks good for both compilers.

AlexanderRichert-NOAA commented 1 year ago

Thanks @ulmononian for taking care of both systems. @aerorahul can you test and confirm that it works on Orion and Hera?

aerorahul commented 1 year ago

Thanks @AlexanderRichert-NOAA and @ulmononian I'll test on Orion.

JessicaMeixner-NOAA commented 1 year ago

I haven't had a chance to test on orion yet, but I did run into this same issue on hera yesterday and it does not look like there is a prod_util on hera yet for the pack-stack-1.4.1/envs/ufs-pio-2.5.10 (or I'm looking for the wrong module, but I don't think it's there) @aerorahul did your tests work on orion? Should I move my testing there?

aerorahul commented 1 year ago

I haven't had a chance to test on orion yet, but I did run into this same issue on hera yesterday and it does not look like there is a prod_util on hera yet for the pack-stack-1.4.1/envs/ufs-pio-2.5.10 (or I'm looking for the wrong module, but I don't think it's there) @aerorahul did your tests work on orion? Should I move my testing there?

@JessicaMeixner-NOAA I have not been able to test the updates from @AlexanderRichert-NOAA on Orion yet (Orion was down yesterday). I will do so today/tomorrow when I can find time.

JessicaMeixner-NOAA commented 1 year ago

I tested this on orion - and after loading the ufs-weather-model modules with spack-stack 1.4.1 I could not load prod_util

On hera, I tried to move to the ufs-weathe-rmodel with 1.5 since that PR is no longer a draft, and also there was no prod_util.

I could find a prod-util, but I don't know what that is or if that's the same thing and on WCOSS2 I know we have prod_util. Thoughts @AlexanderRichert-NOAA or others? I can move my global-workflow PR to use the PR of ufs-weather-model with 1.5 since we'll hopefully soon be there anyways, but I still have the no prod_util issue. Note this is a blocker for us being able to effectively test some GEFS things and prepare for HR3.

climbfuji commented 1 year ago

The module is called prod-util, same as the package name, if I remember correctly.

On Sep 28, 2023, at 1:40 PM, Jessica Meixner @.***> wrote:

I tested this on orion - and after loading the ufs-weather-model modules with spack-stack 1.4.1 I could not load prod_util

On hera, I tried to move to the ufs-weathe-rmodel with 1.5 since that PR is no longer a draft, and also there was no prod_util.

I could find a prod-util, but I don't know what that is or if that's the same thing and on WCOSS2 I know we have prod_util. Thoughts @AlexanderRichert-NOAA https://github.com/AlexanderRichert-NOAA or others? I can move my global-workflow PR to use the PR of ufs-weather-model with 1.5 since we'll hopefully soon be there anyways, but I still have the no prod_util issue. Note this is a blocker for us being able to effectively test some GEFS things and prepare for HR3.

— Reply to this email directly, view it on GitHub https://github.com/JCSDA/spack-stack/issues/780#issuecomment-1739902315, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5C2RMX6IL5KCJMDZXPND3X4XHCLANCNFSM6AAAAAA42IAP7U. You are receiving this because you were mentioned.

JessicaMeixner-NOAA commented 1 year ago

Okay - I've now seen prod-util on hera and orion with spack 1.4. I did re-confirm the package is with an underscore on WCOSS2 though.

AlexanderRichert-NOAA commented 1 year ago

Yeah, I think it got changed under Spack because of their naming conventions...

JessicaMeixner-NOAA commented 1 year ago

Any chance to change it back to be consistent with WCOSS2? That seems like that could cause some issues.

AlexanderRichert-NOAA commented 1 year ago

One solution is to put in an if statement for wcoss2, then take it out once NCO moves to spack/spack-stack. How does that sound? I think the other solution is to copy the 'prod-util' directory to 'prod_util' in our existing deployments, then update our module file config to rename it for future deployments. It'd be nice to avoid that, partly because I don't want developers to start using the hyphenated version then wonder where it went in the next release, but it's one possibility.

climbfuji commented 1 year ago

It's possible that spack doesn't allow an underscore in the package name, but in this case it's no problem at all to just name the module file differently so that there will be no impact on downstream repositories.

climbfuji commented 1 year ago

in configs/common/modules.yaml:

      projections:
        prod-util: 'prod_util/{version}'
AlexanderRichert-NOAA commented 1 year ago

I'm good with that. For the existing releases, we can start a checklist of systems in the issue description and copy prod-util to prod_util on each one.

climbfuji commented 1 year ago

@AlexanderRichert-NOAA We could try something different, locally first on a sandbox: go to the existing environment, add the above projection, EDIT rm -fr envs/unified-env/install/modulefiles, rerun spack module [lmod|tcl] refresh and spack stack setup-meta-modules. That should do.

AlexanderRichert-NOAA commented 1 year ago

I would lean toward just refreshing prod-util (spack module [lmod|tcl] refresh prod-util), but, I don't think I have write access to the relevant installations anyway, so, the details are up to you and/or @ulmononian :)

AlexanderRichert-NOAA commented 1 year ago

Though I'll test it on Acorn

climbfuji commented 1 year ago

Don’t forget the virtual packages that depend on the module for prod_util. You need to refresh these as well.

On Sep 28, 2023, at 2:14 PM, Alex Richert @.***> wrote:

I would lean toward just refreshing prod-util (spack module [lmod|tcl] refresh prod-util), but, I don't think I have write access to the relevant installations anyway, so, the details are up to you and/or @ulmononian https://github.com/ulmononian :)

— Reply to this email directly, view it on GitHub https://github.com/JCSDA/spack-stack/issues/780#issuecomment-1739945484, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5C2RJAFCDZWPCLDYVVYG3X4XLCPANCNFSM6AAAAAA42IAP7U. You are receiving this because you were mentioned.

AlexanderRichert-NOAA commented 1 year ago

Ah right, forgot about those

climbfuji commented 1 year ago

I’ll test it on my mac in a bit, have the full unified-env set up already

On Sep 28, 2023, at 2:16 PM, Alex Richert @.***> wrote:

Ah right, forgot about those

— Reply to this email directly, view it on GitHub https://github.com/JCSDA/spack-stack/issues/780#issuecomment-1739947684, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5C2RPY5HKBJ5DT6GU5U4DX4XLKBANCNFSM6AAAAAA42IAP7U. You are receiving this because you were mentioned.

AlexanderRichert-NOAA commented 1 year ago

Okay, worked fine on Acorn. It's only used by global-workflow-env, so if you want to save 30 seconds you can just rebuild those, then delete the existing hyphenated one (:

climbfuji commented 1 year ago

@AlexanderRichert-NOAA This works as expected. With:

--- configs/common/modules.yaml 2023-09-26 09:03:49.000000000 -0600
+++ envs/unified-env/common/modules.yaml    2023-09-28 14:24:00.000000000 -0600
@@ -16,6 +16,7 @@
         nlohmann-json: '{compiler.name}/{compiler.version}/json/{version}'
         nlohmann-json-schema-validator: '{compiler.name}/{compiler.version}/json-schema-validator/{version}'
         libjpeg-turbo: '{compiler.name}/{compiler.version}/libjpeg/{version}'
+        prod-util: '{compiler.name}/{compiler.version}/prod_util/{version}'
       exclude:
       # List of packages for which we don't need modules
       - apple-libunwind
@@ -271,6 +272,7 @@
         nlohmann-json: 'json/{version}'
         nlohmann-json-schema-validator: 'json-schema-validator/{version}'
         libjpeg-turbo: 'libjpeg/{version}'
+        prod-util: 'prod_util/{version}'
       exclude:
       # List of packages for which we don't need modules
       - apple-libunwind

do:

source setup.sh
tar -cvzf unified_env_modulefiles_backup.tar.gz envs/unified-env/install/modulefiles/
spack env activate -p envs/unified-env
rm -fr envs/unified-env/install/modulefiles
spack module lmod refresh
spack stack setup-meta-modules
spack clean -a && spack env deactivate
AlexanderRichert-NOAA commented 1 year ago

Cool, good call with the backup. I'll put a PR in develop with the config update.

climbfuji commented 1 year ago

I suggest the following:

I cannot do this this week - but next week is ok.

If you agree, can you start the bugfix branch etc please?

AlexanderRichert-NOAA commented 1 year ago

Do you mean check out the 1.5.1 branch in the existing 1.5.0 deployments?

climbfuji commented 1 year ago

Do you mean check out the 1.5.1 branch in the existing 1.5.0 deployments?

Yeah ... I know, not the cleanest approaches of all, but because it's such a tiny change that doesn't involve any real code changes ... what do you think?

I expect a full bug fix release later in this quarter anyway with all the input we get from UFS and JEDI.

AlexanderRichert-NOAA commented 1 year ago

I think I'd rather either make the change to the 1.5.0 branch, or just update the existing envs manually and then incorporate the Proper Fix into 1.5.1 which as I recall we're doing anyway (+develop of course).

climbfuji commented 1 year ago

Ok, fine with me. Since the 1.5.0 tag is still warm (just created a few hours ago), I might as well retag.

climbfuji commented 1 year ago

@AlexanderRichert-NOAA I created the 1.5.1 release branches. This includes the tiny doc bugfix from @ncrossette that I just merged into develop. Please add this small bug fix to 1.5.0, 1.5.1 and develop (create once, cherry-pick twice, get three times the github credits!)

climbfuji commented 1 year ago

@AlexanderRichert-NOAA Can you create a checklist with all the hosts/envs somewhere please that we can use next week to fix the module name as per instructions above? Thank you!

AlexanderRichert-NOAA commented 1 year ago

@climbfuji done