Closed InnocentSouopgui-NOAA closed 3 months ago
After the full migration to Rocky8, trying to build using spack-stack environment for Centos7 is not available. Everything has to build using Rocky8 modules.
would you be able to post your steps & the error message you are receiving?
would you be able to post your steps & the error message you are receiving? The error can be reproduced by connecting to fe3, clone the upp repos and build.
For instance when I do it, I get the error message bellow.
[USER@fe3 tests]$ ./compile_upp.sh
Building for machine jet_c, compiler intel
Lmod has detected the following error: These module(s) or extension(s) exist but cannot be
loaded as requested: "cmake/3.23.1", "jasper/2.0.32"
Try: "module spider cmake/3.23.1 jasper/2.0.32" to see how to load the module(s).
Executing this command requires loading "cmake/3.23.1" which failed while processing the following
module(s):
Module fullname Module Filename
--------------- ---------------
jet_c /mnt/lfs1/NESDIS/nesdis-rdo2/Innocent.Souopgui/devel/upp-rocky8/modulefiles/jet_c.luaExecuting this command requires loading "jasper/2.0.32" which failed while processing the following
module(s):
Module fullname Module Filename
--------------- ---------------
upp_common /mnt/lfs1/NESDIS/nesdis-rdo2/Innocent.Souopgui/devel/upp-rocky8/modulefiles/upp_common.lua
jet_c /mnt/lfs1/NESDIS/nesdis-rdo2/Innocent.Souopgui/devel/upp-rocky8/modulefiles/jet_c.lua
@InnocentSouopgui-NOAA i just tried the following on jet fe3:
module use /mnt/lfs1/NESDIS/nesdis-rdo2/Innocent.Souopgui/devel/upp-rocky8/modulefiles
ml jet
and everything loaded properly:
$ ml
Currently Loaded Modules:
1) intel/2022.1.2 10) zlib/1.2.13 19) parallel-netcdf/1.12.2 28) sp/2.5.0
2) stack-intel/2021.5.0 11) libpng/1.6.37 20) parallelio/2.5.10 29) w3emc/2.10.0
3) impi/2022.1.2 12) pkg-config/0.27.1 21) bacio/2.4.1 30) nemsio/2.5.4
4) stack-intel-oneapi-mpi/2021.5.1 13) hdf5/1.14.0 22) crtm-fix/2.4.0.1_emc 31) sigio/2.3.2
5) nghttp2/1.57.0 14) snappy/1.1.10 23) git-lfs/2.10.0 32) sfcio/1.4.1
6) curl/8.4.0 15) zstd/1.5.2 24) crtm/2.4.0.1 33) wrf-io/1.2.0
7) cmake/3.23.1 16) c-blosc/1.21.5 25) g2/3.4.5 34) upp_common
8) libjpeg/2.1.0 17) netcdf-c/4.9.2 26) g2tmpl/1.10.2 35) jet
9) jasper/2.0.32 18) netcdf-fortran/4.6.1 27) ip/4.3.0
can you share the clone & build steps you did?
@InnocentSouopgui-NOAA i just tried the following on jet fe3:
module use /mnt/lfs1/NESDIS/nesdis-rdo2/Innocent.Souopgui/devel/upp-rocky8/modulefiles ml jet
and everything loaded properly:
$ ml Currently Loaded Modules: 1) intel/2022.1.2 10) zlib/1.2.13 19) parallel-netcdf/1.12.2 28) sp/2.5.0 2) stack-intel/2021.5.0 11) libpng/1.6.37 20) parallelio/2.5.10 29) w3emc/2.10.0 3) impi/2022.1.2 12) pkg-config/0.27.1 21) bacio/2.4.1 30) nemsio/2.5.4 4) stack-intel-oneapi-mpi/2021.5.1 13) hdf5/1.14.0 22) crtm-fix/2.4.0.1_emc 31) sigio/2.3.2 5) nghttp2/1.57.0 14) snappy/1.1.10 23) git-lfs/2.10.0 32) sfcio/1.4.1 6) curl/8.4.0 15) zstd/1.5.2 24) crtm/2.4.0.1 33) wrf-io/1.2.0 7) cmake/3.23.1 16) c-blosc/1.21.5 25) g2/3.4.5 34) upp_common 8) libjpeg/2.1.0 17) netcdf-c/4.9.2 26) g2tmpl/1.10.2 35) jet 9) jasper/2.0.32 18) netcdf-fortran/4.6.1 27) ip/4.3.0
can you share the clone & build steps you did?
@ulmononian Two notes:
When the the compile script tests/compile_upp.sh
is called, it uses the script tests/detect_machine.sh
; The last script loads jet_c for frontend fe[1-4] and jet for frontends fe[5-8]. jet_c is the module file that has a problem on Rocky8.
To reproduce the problem, you will need to clone the UPP repository and call the tests/compile_upp.sh
, or load the module jet_c
.
I now have the freshly clone repository at /mnt/lfs1/NESDIS/nesdis-rdo2/Innocent.Souopgui/devel/upp-develop`
$ module purge
$ module use /mnt/lfs1/NESDIS/nesdis-rdo2/Innocent.Souopgui/devel/upp-develop/modulefiles
$ module load jet_c
and get the error
Lmod has detected the following error: These module(s) or extension(s) exist but
cannot be loaded as requested: "cmake/3.23.1", "jasper/2.0.32"
Try: "module spider cmake/3.23.1 jasper/2.0.32" to see how to load the module(s).
Executing this command requires loading "cmake/3.23.1" which failed while processing the
following module(s):
Module fullname Module Filename
--------------- ---------------
jet_c /mnt/lfs1/NESDIS/nesdis-rdo2/Innocent.Souopgui/devel/upp-develop/modulefiles/jet_c.luaExecuting this command requires loading "jasper/2.0.32" which failed while processing the
following module(s):
Module fullname Module Filename
--------------- ---------------
upp_common /mnt/lfs1/NESDIS/nesdis-rdo2/Innocent.Souopgui/devel/upp-develop/modulefiles/upp_common.lua
jet_c /mnt/lfs1/NESDIS/nesdis-rdo2/Innocent.Souopgui/devel/upp-develop/modulefiles/jet_c.lua
while
$ module load jet
produces the expected result
@InnocentSouopgui-NOAA ok thank you. so it sounds like the detect_machine.sh script needs to be updated to use the rocky8 stack on fe1,3,4 (it should exclude fe2 since this remains a centos node for now). @FernandoAndrade-NOAA what is your take?
@InnocentSouopgui-NOAA ok thank you. so it sounds like the detect_machine.sh script needs to be updated to use the rocky8 stack on fe1,3,4 (it should exclude fe2 since this remains a centos node for now). @FernandoAndrade-NOAA what is your take?
That is right. I already updated detect_machine.sh script in the PR #920 which is part of a bigger effort to migrate Global Workflow to Rocky8 on Jet NOAA-EMC/global-workflow#2377 As for excluding fe2, Someone commented that next Tuesday, there will be no Centos node left on Jet. I can't find that comment anymore. So the question is still there should fe2 be excluded or not?
@InnocentSouopgui-NOAA ok thank you. so it sounds like the detect_machine.sh script needs to be updated to use the rocky8 stack on fe1,3,4 (it should exclude fe2 since this remains a centos node for now). @FernandoAndrade-NOAA what is your take?
That is right. I already updated detect_machine.sh script in the PR #920 which is part of a bigger effort to migrate Global Workflow to Rocky8 on Jet NOAA-EMC/global-workflow#2377 As for excluding fe2, Someone commented that next Tuesday, there will be no Centos node left on Jet. I can't find that comment anymore. So the question is still there should fe2 be excluded or not?
@InnocentSouopgui-NOAA Given the final jet rocky8 transition next week, We don't need to exclude fe2. Your Upp PR #920 look good for me.
@InnocentSouopgui-NOAA @WenMeng-NOAA given this, it sounds like this issue is taken care of from the spack-stack perspective?
@InnocentSouopgui-NOAA @WenMeng-NOAA given this, it sounds like this issue is taken care of from the spack-stack perspective?
Yes, there is already a new installation of spack-stack for Rocky. I believe that during the transition when we had some partitions and frontend with CentOS7 and other with Rocky8, what is now a bug was implemented to automatically build on CentOS using centOS spack-stack, and using rocky8 spack-stack on rocky8 front-end. Now all front-end nodes except fe2 are running rocky8, and fe2 is scheduled to move to rocky8 soon.
you can have a look at the PR #920 solving this issue. It's part of a set of PR to migrate global workflow to rocky8 on Jet.
Upp build is failing on some front nodes of Jet because of failure to locate some modules.