NOAA-EMC / GDASApp

Global Data Assimilation System Application
GNU Lesser General Public License v2.1
15 stars 31 forks source link

Build and run GDASApp on Rocky8 Hera nodes #958

Closed RussTreadon-NOAA closed 7 months ago

RussTreadon-NOAA commented 8 months ago

Hera is transitioning to the Rocky-8 OS as scheduled below

We need to ensure GDASApp can build and run on Rocky-8 Hera nodes. This issue is opened to track this transition.

RussTreadon-NOAA commented 8 months ago

GDASApp develop at a3c3c10 does not build on Rocky-8 Hera nodes as is. Execution of ./build.sh -v returns

The build command returned

Hera(hfe10):/scratch1/NCEPDEV/da/Russ.Treadon/git/GDASApp/rocky8$ ./build.sh -v
Building GDASApp on hera
Lmod Warning:
-----------------------------------------------------------------------------------------------------------------------------------------
The following dependent module(s) are not currently loaded: py-numpy/1.22.3 (required by: bufr/12.0.1), python/3.10.13 (required by:
boost/1.83.0, bufr/12.0.1, fckit/0.11.0, atlas/0.35.1, py-pybind11/2.11.0)
-----------------------------------------------------------------------------------------------------------------------------------------
RussTreadon-NOAA commented 8 months ago

@CoryMartin-NOAA suggested commenting out the miniconda3 section of hera.intel.lua

load("hpc/1.2.0")
unload("python/3.10.13")
unload("py-numpy/1.22.3")
load("miniconda3/4.6.14")
load("gdasapp/1.0.0")

This was done and build.sh -v successfully ran to completion.

Execute test_gdasapp with the following result

Hera(hfe10):/scratch1/NCEPDEV/da/Russ.Treadon/git/GDASApp/rocky8/build$ ctest -R test_gdasapp
Test project /scratch1/NCEPDEV/da/Russ.Treadon/git/GDASApp/rocky8/build
      Start 1388: test_gdasapp_util_coding_norms
 1/30 Test #1388: test_gdasapp_util_coding_norms .............   Passed    1.38 sec
      Start 1389: test_gdasapp_util_ioda_example
 2/30 Test #1389: test_gdasapp_util_ioda_example .............   Passed    0.13 sec
      Start 1390: test_gdasapp_util_prepdata
 3/30 Test #1390: test_gdasapp_util_prepdata .................   Passed    0.80 sec
      Start 1391: test_gdasapp_util_rads2ioda
 4/30 Test #1391: test_gdasapp_util_rads2ioda ................   Passed    0.13 sec
      Start 1392: test_gdasapp_util_ghrsst2ioda
 5/30 Test #1392: test_gdasapp_util_ghrsst2ioda ..............   Passed    0.13 sec
      Start 1393: test_gdasapp_util_smap2ioda
 6/30 Test #1393: test_gdasapp_util_smap2ioda ................   Passed    0.12 sec
      Start 1394: test_gdasapp_util_smos2ioda
 7/30 Test #1394: test_gdasapp_util_smos2ioda ................   Passed    0.12 sec
      Start 1395: test_gdasapp_util_viirsaod2ioda
 8/30 Test #1395: test_gdasapp_util_viirsaod2ioda ............   Passed    0.12 sec
      Start 1396: test_gdasapp_util_icecamsr2ioda
 9/30 Test #1396: test_gdasapp_util_icecamsr2ioda ............   Passed    0.12 sec
      Start 1732: test_gdasapp_check_python_norms
10/30 Test #1732: test_gdasapp_check_python_norms ............   Passed    1.80 sec
      Start 1733: test_gdasapp_check_yaml_keys
11/30 Test #1733: test_gdasapp_check_yaml_keys ...............***Failed    0.08 sec
      Start 1734: test_gdasapp_jedi_increment_to_fv3
12/30 Test #1734: test_gdasapp_jedi_increment_to_fv3 .........***Failed    0.06 sec
      Start 1735: test_gdasapp_convert_ewok_yaml
13/30 Test #1735: test_gdasapp_convert_ewok_yaml .............***Failed    0.12 sec
      Start 1736: test_gdasapp_convert_bufr_temp_dbuoy
14/30 Test #1736: test_gdasapp_convert_bufr_temp_dbuoy .......   Passed    0.22 sec
      Start 1737: test_gdasapp_convert_bufr_salt_dbuoy
15/30 Test #1737: test_gdasapp_convert_bufr_salt_dbuoy .......   Passed    0.22 sec
      Start 1738: test_gdasapp_convert_bufr_temp_mbuoyb
16/30 Test #1738: test_gdasapp_convert_bufr_temp_mbuoyb ......   Passed    0.22 sec
      Start 1739: test_gdasapp_convert_bufr_salt_mbuoyb
17/30 Test #1739: test_gdasapp_convert_bufr_salt_mbuoyb ......   Passed    0.21 sec
      Start 1740: test_gdasapp_convert_bufr_tesacprof
18/30 Test #1740: test_gdasapp_convert_bufr_tesacprof ........   Passed    0.22 sec
      Start 1741: test_gdasapp_convert_bufr_trkobprof
19/30 Test #1741: test_gdasapp_convert_bufr_trkobprof ........   Passed    0.21 sec
      Start 1742: test_gdasapp_convert_bufr_sfcships
20/30 Test #1742: test_gdasapp_convert_bufr_sfcships .........   Passed    0.21 sec
      Start 1743: test_gdasapp_convert_bufr_sfcshipsu
21/30 Test #1743: test_gdasapp_convert_bufr_sfcshipsu ........   Passed    0.21 sec
      Start 1744: test_gdasapp_soca_nsst_increment_to_mom6
22/30 Test #1744: test_gdasapp_soca_nsst_increment_to_mom6 ...***Failed    0.09 sec
      Start 1745: test_gdasapp_snow_create_ens
23/30 Test #1745: test_gdasapp_snow_create_ens ...............***Failed    2.10 sec
      Start 1746: test_gdasapp_snow_imsproc
24/30 Test #1746: test_gdasapp_snow_imsproc ..................***Failed    1.98 sec
      Start 1747: test_gdasapp_snow_apply_jediincr
25/30 Test #1747: test_gdasapp_snow_apply_jediincr ...........   Passed    7.08 sec
      Start 1748: test_gdasapp_snow_letkfoi_snowda
26/30 Test #1748: test_gdasapp_snow_letkfoi_snowda ...........***Failed    0.83 sec
      Start 1749: test_gdasapp_convert_bufr_adpsfc_snow
27/30 Test #1749: test_gdasapp_convert_bufr_adpsfc_snow ......   Passed    2.66 sec
      Start 1750: test_gdasapp_convert_bufr_adpsfc
28/30 Test #1750: test_gdasapp_convert_bufr_adpsfc ...........   Passed    3.54 sec
      Start 1751: test_gdasapp_convert_gsi_satbias
29/30 Test #1751: test_gdasapp_convert_gsi_satbias ...........***Failed    0.28 sec
      Start 1752: test_gdasapp_aero_gen_3dvar_yaml
30/30 Test #1752: test_gdasapp_aero_gen_3dvar_yaml ...........***Failed    0.11 sec

70% tests passed, 9 tests failed out of 30

Label Time Summary:
gdas-utils    =   3.06 sec*proc (9 tests)
script        =   3.06 sec*proc (9 tests)

Total Test time (real) =  25.95 sec

The following tests FAILED:
        1733 - test_gdasapp_check_yaml_keys (Failed)
        1734 - test_gdasapp_jedi_increment_to_fv3 (Failed)
        1735 - test_gdasapp_convert_ewok_yaml (Failed)
        1744 - test_gdasapp_soca_nsst_increment_to_mom6 (Failed)
        1745 - test_gdasapp_snow_create_ens (Failed)
        1746 - test_gdasapp_snow_imsproc (Failed)
        1748 - test_gdasapp_snow_letkfoi_snowda (Failed)
        1751 - test_gdasapp_convert_gsi_satbias (Failed)
        1752 - test_gdasapp_aero_gen_3dvar_yaml (Failed)
Errors while running CTest
Output from these tests are in: /scratch1/NCEPDEV/da/Russ.Treadon/git/GDASApp/rocky8/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.

Each of the failed tests is failed because required python modules could not be found. Missing modules include yaml, netCDF4, xarray, and wxflow.

RussTreadon-NOAA commented 8 months ago

@CoryMartin-NOAA created feature/rocky8 with updates to hera.intel.lua. This branch builds on Hera Rocky8 nodes. ctests return the following

Hera(hfe10):/scratch1/NCEPDEV/da/Russ.Treadon/git/GDASApp/feature_rocky8/build$ ctest -R test_gdasapp
Test project /scratch1/NCEPDEV/da/Russ.Treadon/git/GDASApp/feature_rocky8/build
      Start 1388: test_gdasapp_util_coding_norms
 1/30 Test #1388: test_gdasapp_util_coding_norms .............   Passed    1.32 sec
      Start 1389: test_gdasapp_util_ioda_example
 2/30 Test #1389: test_gdasapp_util_ioda_example .............   Passed    1.05 sec
      Start 1390: test_gdasapp_util_prepdata
 3/30 Test #1390: test_gdasapp_util_prepdata .................   Passed    0.87 sec
      Start 1391: test_gdasapp_util_rads2ioda
 4/30 Test #1391: test_gdasapp_util_rads2ioda ................   Passed    0.22 sec
      Start 1392: test_gdasapp_util_ghrsst2ioda
 5/30 Test #1392: test_gdasapp_util_ghrsst2ioda ..............   Passed    0.14 sec
      Start 1393: test_gdasapp_util_smap2ioda
 6/30 Test #1393: test_gdasapp_util_smap2ioda ................   Passed    0.14 sec
      Start 1394: test_gdasapp_util_smos2ioda
 7/30 Test #1394: test_gdasapp_util_smos2ioda ................   Passed    0.14 sec
      Start 1395: test_gdasapp_util_viirsaod2ioda
 8/30 Test #1395: test_gdasapp_util_viirsaod2ioda ............   Passed    0.13 sec
      Start 1396: test_gdasapp_util_icecamsr2ioda
 9/30 Test #1396: test_gdasapp_util_icecamsr2ioda ............   Passed    0.13 sec
      Start 1732: test_gdasapp_check_python_norms
10/30 Test #1732: test_gdasapp_check_python_norms ............   Passed    1.87 sec
      Start 1733: test_gdasapp_check_yaml_keys
11/30 Test #1733: test_gdasapp_check_yaml_keys ...............   Passed    0.16 sec
      Start 1734: test_gdasapp_jedi_increment_to_fv3
12/30 Test #1734: test_gdasapp_jedi_increment_to_fv3 .........   Passed    1.75 sec
      Start 1735: test_gdasapp_convert_ewok_yaml
13/30 Test #1735: test_gdasapp_convert_ewok_yaml .............   Passed    0.31 sec
      Start 1736: test_gdasapp_convert_bufr_temp_dbuoy
14/30 Test #1736: test_gdasapp_convert_bufr_temp_dbuoy .......   Passed    0.79 sec
      Start 1737: test_gdasapp_convert_bufr_salt_dbuoy
15/30 Test #1737: test_gdasapp_convert_bufr_salt_dbuoy .......   Passed    0.25 sec
      Start 1738: test_gdasapp_convert_bufr_temp_mbuoyb
16/30 Test #1738: test_gdasapp_convert_bufr_temp_mbuoyb ......   Passed    0.26 sec
      Start 1739: test_gdasapp_convert_bufr_salt_mbuoyb
17/30 Test #1739: test_gdasapp_convert_bufr_salt_mbuoyb ......   Passed    0.26 sec
      Start 1740: test_gdasapp_convert_bufr_tesacprof
18/30 Test #1740: test_gdasapp_convert_bufr_tesacprof ........   Passed    0.26 sec
      Start 1741: test_gdasapp_convert_bufr_trkobprof
19/30 Test #1741: test_gdasapp_convert_bufr_trkobprof ........   Passed    0.25 sec
      Start 1742: test_gdasapp_convert_bufr_sfcships
20/30 Test #1742: test_gdasapp_convert_bufr_sfcships .........   Passed    0.25 sec
      Start 1743: test_gdasapp_convert_bufr_sfcshipsu
21/30 Test #1743: test_gdasapp_convert_bufr_sfcshipsu ........   Passed    0.26 sec
      Start 1744: test_gdasapp_soca_nsst_increment_to_mom6
22/30 Test #1744: test_gdasapp_soca_nsst_increment_to_mom6 ...***Failed    4.29 sec
      Start 1745: test_gdasapp_snow_create_ens
23/30 Test #1745: test_gdasapp_snow_create_ens ...............   Passed    1.04 sec
      Start 1746: test_gdasapp_snow_imsproc
24/30 Test #1746: test_gdasapp_snow_imsproc ..................   Passed    2.62 sec
      Start 1747: test_gdasapp_snow_apply_jediincr
25/30 Test #1747: test_gdasapp_snow_apply_jediincr ...........   Passed    8.77 sec
      Start 1748: test_gdasapp_snow_letkfoi_snowda
26/30 Test #1748: test_gdasapp_snow_letkfoi_snowda ...........   Passed   17.70 sec
      Start 1749: test_gdasapp_convert_bufr_adpsfc_snow
27/30 Test #1749: test_gdasapp_convert_bufr_adpsfc_snow ......   Passed    2.78 sec
      Start 1750: test_gdasapp_convert_bufr_adpsfc
28/30 Test #1750: test_gdasapp_convert_bufr_adpsfc ...........   Passed    3.63 sec
      Start 1751: test_gdasapp_convert_gsi_satbias
29/30 Test #1751: test_gdasapp_convert_gsi_satbias ...........   Passed    1.75 sec
      Start 1752: test_gdasapp_aero_gen_3dvar_yaml
30/30 Test #1752: test_gdasapp_aero_gen_3dvar_yaml ...........   Passed    0.44 sec

97% tests passed, 1 tests failed out of 30

Label Time Summary:
gdas-utils    =   4.15 sec*proc (9 tests)
script        =   4.15 sec*proc (9 tests)

Total Test time (real) =  54.29 sec

The following tests FAILED:
        1744 - test_gdasapp_soca_nsst_increment_to_mom6 (Failed)
Errors while running CTest
Output from these tests are in: /scratch1/NCEPDEV/da/Russ.Treadon/git/GDASApp/feature_rocky8/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.

ctest -VV for soca_nsst_increment_to_mom6 returns

1744: Traceback (most recent call last):
1744:   File "/scratch1/NCEPDEV/da/Russ.Treadon/git/GDASApp/feature_rocky8/bundle/gdas/ush/socaincr2mom6.py", line 8, in <module>
1744:     import ufsda
1744:   File "/scratch1/NCEPDEV/da/Russ.Treadon/git/GDASApp/feature_rocky8/ush/ufsda/__init__.py", line 2, in <module>
1744:     from .ufs_yaml import gen_yaml, parse_config
1744:   File "/scratch1/NCEPDEV/da/Russ.Treadon/git/GDASApp/feature_rocky8/ush/ufsda/ufs_yaml.py", line 3, in <module>
1744:     from solo.yaml_file import YAMLFile
1744: ModuleNotFoundError: No module named 'solo'

GDASApp issue #960 might resolve this failure.

RussTreadon-NOAA commented 8 months ago

Merge changes in PR #961 into a working copy of feature/rocky8. Rerunning build. I'm hopeful all ctests will pass once the build completes.

RussTreadon-NOAA commented 8 months ago

Unfortunately test_gdasapp_soca_nsst_increment_to_mom6 still failed. Here is the -VV traceback

1744: Traceback (most recent call last):
1744:   File "/scratch1/NCEPDEV/da/Russ.Treadon/git/GDASApp/feature_rocky8/bundle/gdas/ush/socaincr2mom6.py", line 8, in <module>
1744:     import ufsda
1744:   File "/scratch1/NCEPDEV/da/Russ.Treadon/git/GDASApp/feature_rocky8/ush/ufsda/__init__.py", line 2, in <module>
1744:     from .ufs_yaml import gen_yaml, parse_config
1744:   File "/scratch1/NCEPDEV/da/Russ.Treadon/git/GDASApp/feature_rocky8/ush/ufsda/ufs_yaml.py", line 3, in <module>
1744:     from solo.yaml_file import YAMLFile
1744: ModuleNotFoundError: No module named 'solo'

A check of ush/ufsda/ufs_yaml.py finds solo references

from solo.yaml_file import YAMLFile
from solo.template import TemplateConstants, Template

Methods replace_vars and iter_config reference Template and TemplateConstants

What should we use to replace solo in ufs_yaml.py?

CoryMartin-NOAA commented 8 months ago

wxflow needs to replace solo for ufs_yaml. We may be able to just yank out the solo-used things in ufsda and be good to go? I'm not sure they are being used, but they are still getting 'imported'

RussTreadon-NOAA commented 8 months ago

Agreed.

I yanked solo references out of

- ush/ufsda/ufs_yaml.py
- ush/ufsda/yamltools.py

There's one more solor reference in ush/ufsda/misc_utils.py

import solo.date

...

    # solo has a nice utility for this
    fcst_steps = solo.date.step_sequence(start, end, fcst_step)

What can we use to replace solo.date?

CoryMartin-NOAA commented 8 months ago

Not sure there is one, we would probably have to write it using things from datetime and wxflow. Is that function used anywhere @guillaumevernieres ?

RussTreadon-NOAA commented 8 months ago

Yanking solo references out of ush/ufsda/ufs_yaml.py and ush/ufsda/yamltools.py won't work. The removed methods are used elsewhere in these python scripts. We need to (a) replace the functionality with non-solo counterparts or (b) remove ufs_yaml.py and yamltools.py. Before we can consider (b) we need to ensure nothing in these python scripts is used elsewhere.

The solo dependency in misc_utils.py is in method calc_fcst_steps. grep did not find any other instances of calc_fcst_steps inscripts,test,ush, orutilsapart from its use inmisc_utils.py. I don't think we usecalc_fcst_steps` in g-w or elsewhere but I didn't check or ask.

RussTreadon-NOAA commented 8 months ago

FYI, when I comment out the solo.date line in misc_utils.py along with my previous yanks in ufs_yaml.py and yamltools.py all test_gdasapp ctests pass on Rocky8 nodes using branch feature/rocky8

RussTreadon-NOAA commented 8 months ago

g-w issue #2329 is the umbrella issue for updating various apps to Rocky8 on Hera. Cross-referencing with this GDASApp issue for your information, @DavidHuber-NOAA

RussTreadon-NOAA commented 8 months ago

g-w wxflow contains YAMLFile, TemplateConstants, and Template so use wxflow to replace solo in ush/ufsda/ufs_yaml.py and /ush/ufsda/yamltools.py.

ush/ufsda/misc_utils.py uses solo in method calc_fcst_steps. A grep of calc_fcst_steps in GDASApp did not find any occurrences of calc_fcst_steps apart from the definition in misc_utils.py. A check of g-w only found cal_fcst_steps referenced in sorc/gdas.cd/ush/ufsda/misc_utils.py. Given this, remove calc_fcst_steps from misc_utils.py.

Changes committed at edd37b6.

RussTreadon-NOAA commented 8 months ago

Execute build.sh for feature/rocky8 at c76cf7f. Run GDASApp ctests. All 30 tests pass.

The changes in feature/rocky8 are ready for a PR.

While a PR can be opened, reviewed, and approved, we need to decide when to merge into develop. Hera will not by fully Rocky 8 until 4/2. If we merge feature/rocky8 into develop before 4/2, GDASApp will build and run on Rocky 8 Hera nodes. It will not run as is on CentOS7. The path to spack-stack differs for these two operating systems.

CoryMartin-NOAA commented 8 months ago

@RussTreadon-NOAA what about pulling the non-modulefile changes (removing solo, etc.) out into a PR, and then a draft PR of the modulefile change? I think we also need to do the same on Orion "soon".

RussTreadon-NOAA commented 8 months ago

Sure, two PRs works. The changes to

ush/ufsda/misc_utils.py
ush/ufsda/ufs_yaml.py
ush/ufsda/yamltools.py

might belong to issue #964.

This would leave feature/rocky8 with only the Hera Rocky8 changes in modulefiles/GDAS/hera.intel.lua.

RussTreadon-NOAA commented 8 months ago

Decided to go ahead and open a quick PR to get the script changes into develop. See PR #964. PR remains in draft mode until I can build and confirm all ctests pass with just these changes in place.

RussTreadon-NOAA commented 8 months ago

PR #964 is now ready for review. Once merged into develop, only modulefiles changes remain to enable GDASApp to run on Rocky8 Hera nodes ... and later Rocky8 Orion nodes.

RussTreadon-NOAA commented 8 months ago

Merge develop at 45c2d16 into feature/rocky8 at 9533705. Rerun ctests. All 31 tests pass.

RussTreadon-NOAA commented 8 months ago

@CoryMartin-NOAA , I think we can open a draft PR to merge feature/rocky8 into develop. Next week (3/19) two thirds of Hera will be running Rocky8.