NOAA-EMC / GDASApp

Global Data Assimilation System Application
GNU Lesser General Public License v2.1
14 stars 28 forks source link

Fix GW jjob tests for upcoming GW PR #2420 #1041

Closed DavidNew-NOAA closed 3 months ago

DavidNew-NOAA commented 3 months ago

This PR addresses issue #1011, related to the failure of the "gdasatmanlvar" jjob test due to a change in the name of the "gdasatmanlvar" run script, and, with the upcoming Global Workflow PR #2420, the impending failure of the "gdasatmanlfinal" jjob. This fixes the script name in the "gdasatmanlvar" test and adds a new test for "gdasatmanlfv3inc" which will fix the issure with "gdasatmanlfinal".

The "gdasatmanlfv3inc" and "gdasatmanlfinal" won't pass yes in GW develop, but will after #2420 merges GW feature/jediinc2fv3.

This PR is just a verbatim copy of @RussTreadon-NOAA 's work, taken from his comment here. I re-ran the new tests, and they also passed for me in GW feature/jediinc2fv3.

RussTreadon-NOAA commented 3 months ago

While test_gdasapp_atm_jjob_var_inc runs to completion on Orion, it fails on Hera. Below is the Hera traceback

0: Info     :        Registry status:   0
2: corrupted size vs. prev_size
5: corrupted size vs. prev_size
0:
0: Run: Finishing gdasapp::fv3inc
0: OOPS_STATS

...

0: OOPS_STATS util::Timers::measured                              :     16002.43       1        16002.4301
0: OOPS_STATS ------------------------------------ Timing Statistics -------------------------------------
srun: error: h20c45: task 2: Aborted (core dumped)
srun: Terminating StepId=58208170.0
0: slurmstepd: error: *** STEP 58208170.0 ON h20c45 CANCELLED AT 2024-04-13T10:59:40 ***
srun: error: h20c45: tasks 0,3-4: Terminated
srun: error: h20c45: task 5: Aborted (core dumped)
srun: error: h20c45: task 1: Terminated

The printout

2: corrupted size vs. prev_size
5: corrupted size vs. prev_size

is not in the Orion job log file.

I reran the job on Hera with --mem=0 to request all memory on the Hera node. test_gdasapp_atm_jjob_var_inc still failed in the same way.

test_gdasapp_fv3jedi_fv3inc also failed on Hera despite feature/update_jjob_tests including GDASApp PR #1039. The Hera traceback for test_gdasapp_fv3jedi_fv3inc contains the following

Test     : FV3 Increment:

Test     : ----------------------------------------------------------------------------------------------------
Test     : Increment print | number of fields = 9 | cube sphere face size: C12
Test     : eastward_wind                                | Min:-4.3422836419308410e+00 Max:+1.2320940067737499e+01 RMS:+3.0957235443709130e-01
Test     : northward_wind                               | Min:-4.1090470888107049e+00 Max:+5.4552721209750796e+00 RMS:+3.1062842460180656e-01
Test     : air_temperature                              | Min:-5.2980343087781989e-01 Max:+5.1811022097894011e-01 RMS:+3.5920751813835923e-02
Test     : specific_humidity                            | Min:-2.8092260972819617e-04 Max:+2.9434075393080551e-04 RMS:+1.6532405760382759e-05
Test     : cloud_liquid_ice                             | Min:+0.0000000000000000e+00 Max:+0.0000000000000000e+00 RMS:+0.0000000000000000e+00
Test     : cloud_liquid_water                           | Min:+0.0000000000000000e+00 Max:+0.0000000000000000e+00 RMS:+0.0000000000000000e+00
Test     : ozone_mass_mixing_ratio                      | Min:+0.0000000000000000e+00 Max:+0.0000000000000000e+00 RMS:+0.0000000000000000e+00
Test     : air_pressure_thickness                       | Min:-2.9992886080290191e+00 Max:+1.5291703492039233e+00 RMS:+1.7535872214547940e-01
Test     : hydrostatic_layer_thickness                  | Min:-4.6699236754648155e-01 Max:+7.4693987323735200e-01 RMS:+3.1162055487823255e-02
Test     : ----------------------------------------------------------------------------------------------------
double free or corruption (!prev)
srun: error: h32m52: task 5: Aborted (core dumped)
srun: Terminating StepId=58208119.0
slurmstepd: error: *** STEP 58208119.0 ON h32m52 CANCELLED AT 2024-04-13T10:51:05 ***
srun: error: h32m52: tasks 0,2-4: Terminated
srun: error: h32m52: task 1: Terminated
srun: Force Terminated StepId=58208119.0

Not sure what's going on.

RussTreadon-NOAA commented 3 months ago

Repeat Hera test on Orion. Interestingly, all tests pass on Orion.

Test project /work2/noaa/da/rtreadon/git/global-workflow/jediinc2fv3/sorc/gdas.cd/build
      Start 1393: test_gdasapp_util_coding_norms
 1/54 Test #1393: test_gdasapp_util_coding_norms ........................   Passed    4.74 sec
      Start 1394: test_gdasapp_util_ioda_example
 2/54 Test #1394: test_gdasapp_util_ioda_example ........................   Passed    8.84 sec
      Start 1395: test_gdasapp_util_prepdata
 3/54 Test #1395: test_gdasapp_util_prepdata ............................   Passed    5.37 sec
      Start 1396: test_gdasapp_util_rads2ioda
 4/54 Test #1396: test_gdasapp_util_rads2ioda ...........................   Passed    0.53 sec
      Start 1397: test_gdasapp_util_ghrsst2ioda
 5/54 Test #1397: test_gdasapp_util_ghrsst2ioda .........................   Passed    0.19 sec
      Start 1398: test_gdasapp_util_smap2ioda
 6/54 Test #1398: test_gdasapp_util_smap2ioda ...........................   Passed    0.19 sec
      Start 1399: test_gdasapp_util_smos2ioda
 7/54 Test #1399: test_gdasapp_util_smos2ioda ...........................   Passed    0.20 sec
      Start 1400: test_gdasapp_util_viirsaod2ioda
 8/54 Test #1400: test_gdasapp_util_viirsaod2ioda .......................   Passed    0.18 sec
      Start 1401: test_gdasapp_util_icecamsr2ioda
 9/54 Test #1401: test_gdasapp_util_icecamsr2ioda .......................   Passed    0.17 sec
      Start 1739: test_gdasapp_check_python_norms
10/54 Test #1739: test_gdasapp_check_python_norms .......................   Passed    3.10 sec
      Start 1740: test_gdasapp_check_yaml_keys
11/54 Test #1740: test_gdasapp_check_yaml_keys ..........................   Passed    2.39 sec
      Start 1741: test_gdasapp_jedi_increment_to_fv3
12/54 Test #1741: test_gdasapp_jedi_increment_to_fv3 ....................   Passed   17.00 sec
      Start 1742: test_gdasapp_setup_cycled_exp
13/54 Test #1742: test_gdasapp_setup_cycled_exp .........................   Passed    3.69 sec
      Start 1743: test_gdasapp_fv3jedi_fv3inc
14/54 Test #1743: test_gdasapp_fv3jedi_fv3inc ...........................   Passed   36.63 sec
      Start 1744: test_gdasapp_convert_bufr_temp_dbuoy
15/54 Test #1744: test_gdasapp_convert_bufr_temp_dbuoy ..................   Passed    2.38 sec
      Start 1745: test_gdasapp_convert_bufr_salt_dbuoy
16/54 Test #1745: test_gdasapp_convert_bufr_salt_dbuoy ..................   Passed    0.33 sec
      Start 1746: test_gdasapp_convert_bufr_temp_mbuoyb
17/54 Test #1746: test_gdasapp_convert_bufr_temp_mbuoyb .................   Passed    0.30 sec
      Start 1747: test_gdasapp_convert_bufr_salt_mbuoyb
18/54 Test #1747: test_gdasapp_convert_bufr_salt_mbuoyb .................   Passed    0.29 sec
      Start 1748: test_gdasapp_convert_bufr_tesacprof
19/54 Test #1748: test_gdasapp_convert_bufr_tesacprof ...................   Passed    0.27 sec
      Start 1749: test_gdasapp_convert_bufr_trkobprof
20/54 Test #1749: test_gdasapp_convert_bufr_trkobprof ...................   Passed    0.28 sec
      Start 1750: test_gdasapp_convert_bufr_sfcships
21/54 Test #1750: test_gdasapp_convert_bufr_sfcships ....................   Passed    0.28 sec
      Start 1751: test_gdasapp_convert_bufr_sfcshipsu
22/54 Test #1751: test_gdasapp_convert_bufr_sfcshipsu ...................   Passed    0.30 sec
      Start 1752: test_gdasapp_soca_nsst_increment_to_mom6
23/54 Test #1752: test_gdasapp_soca_nsst_increment_to_mom6 ..............   Passed   47.27 sec
      Start 1753: test_gdasapp_soca_prep
24/54 Test #1753: test_gdasapp_soca_prep ................................   Passed    6.92 sec
      Start 1754: test_gdasapp_soca_run_clean
25/54 Test #1754: test_gdasapp_soca_run_clean ...........................   Passed    0.14 sec
      Start 1755: test_gdasapp_soca_setup_obsprep
26/54 Test #1755: test_gdasapp_soca_setup_obsprep .......................   Passed   21.09 sec
      Start 1756: test_gdasapp_soca_JGLOBAL_PREP_OCEAN_OBS
27/54 Test #1756: test_gdasapp_soca_JGLOBAL_PREP_OCEAN_OBS ..............   Passed   44.66 sec
      Start 1757: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_PREP
28/54 Test #1757: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_PREP ....   Passed   74.31 sec
      Start 1758: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_BMAT
29/54 Test #1758: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_BMAT ....   Passed   74.27 sec
      Start 1759: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_RUN
30/54 Test #1759: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_RUN .....   Passed   42.24 sec
      Start 1760: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_ECEN
31/54 Test #1760: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_ECEN ....   Passed   42.24 sec
      Start 1761: test_gdasapp_soca_copy_scratch
32/54 Test #1761: test_gdasapp_soca_copy_scratch ........................   Passed    3.30 sec
      Start 1762: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_CHKPT
33/54 Test #1762: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_CHKPT ...   Passed   42.33 sec
      Start 1763: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_POST
34/54 Test #1763: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_POST ....   Passed   10.49 sec
      Start 1764: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY
35/54 Test #1764: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY ....   Passed  140.24 sec
      Start 1765: test_gdasapp_soca_socahybridweights
36/54 Test #1765: test_gdasapp_soca_socahybridweights ...................   Passed   10.40 sec
      Start 1766: test_gdasapp_soca_incr_handler
37/54 Test #1766: test_gdasapp_soca_incr_handler ........................   Passed   10.37 sec
      Start 1767: test_gdasapp_soca_ens_handler
38/54 Test #1767: test_gdasapp_soca_ens_handler .........................   Passed   10.39 sec
      Start 1768: test_gdasapp_snow_create_ens
39/54 Test #1768: test_gdasapp_snow_create_ens ..........................   Passed    6.62 sec
      Start 1769: test_gdasapp_snow_imsproc
40/54 Test #1769: test_gdasapp_snow_imsproc .............................   Passed    5.02 sec
      Start 1770: test_gdasapp_snow_apply_jediincr
41/54 Test #1770: test_gdasapp_snow_apply_jediincr ......................   Passed    7.35 sec
      Start 1771: test_gdasapp_snow_letkfoi_snowda
42/54 Test #1771: test_gdasapp_snow_letkfoi_snowda ......................   Passed   37.91 sec
      Start 1772: test_gdasapp_convert_bufr_adpsfc_snow
43/54 Test #1772: test_gdasapp_convert_bufr_adpsfc_snow .................   Passed    3.70 sec
      Start 1773: test_gdasapp_convert_bufr_adpsfc
44/54 Test #1773: test_gdasapp_convert_bufr_adpsfc ......................   Passed    4.78 sec
      Start 1774: test_gdasapp_convert_gsi_satbias
45/54 Test #1774: test_gdasapp_convert_gsi_satbias ......................   Passed    2.13 sec
      Start 1775: test_gdasapp_setup_atm_cycled_exp
46/54 Test #1775: test_gdasapp_setup_atm_cycled_exp .....................   Passed    1.38 sec
      Start 1776: test_gdasapp_atm_jjob_var_init
47/54 Test #1776: test_gdasapp_atm_jjob_var_init ........................   Passed   45.87 sec
      Start 1777: test_gdasapp_atm_jjob_var_run
48/54 Test #1777: test_gdasapp_atm_jjob_var_run .........................   Passed  107.90 sec
      Start 1778: test_gdasapp_atm_jjob_var_inc
49/54 Test #1778: test_gdasapp_atm_jjob_var_inc .........................   Passed   74.80 sec
      Start 1779: test_gdasapp_atm_jjob_var_final
50/54 Test #1779: test_gdasapp_atm_jjob_var_final .......................   Passed   42.21 sec
      Start 1780: test_gdasapp_atm_jjob_ens_init
51/54 Test #1780: test_gdasapp_atm_jjob_ens_init ........................   Passed   45.75 sec
      Start 1781: test_gdasapp_atm_jjob_ens_run
52/54 Test #1781: test_gdasapp_atm_jjob_ens_run .........................   Passed  298.43 sec
      Start 1782: test_gdasapp_atm_jjob_ens_final
53/54 Test #1782: test_gdasapp_atm_jjob_ens_final .......................   Passed   42.22 sec
      Start 1783: test_gdasapp_aero_gen_3dvar_yaml
54/54 Test #1783: test_gdasapp_aero_gen_3dvar_yaml ......................   Passed    0.40 sec

100% tests passed, 0 tests failed out of 54

Label Time Summary:
gdas-utils    =  20.41 sec*proc (9 tests)
script        =  20.41 sec*proc (9 tests)

Total Test time (real) = 1396.98 sec

What's causing the Hera failure? Orion still runs CentOS 7. Hera runs Rocky 8. Both the Hera and Orion GDASApp builds use spack-stack/1.6.0.

DavidNew-NOAA commented 3 months ago

@RussTreadon-NOAA Yeah, this is really perplexing. I'm going to take a closer look on Monday.

RussTreadon-NOAA commented 3 months ago

Recompile Orion installation on Hercules. test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_ECEN and test_gdasapp_atm_jjob_var_inc fail due to job control variables not being set in g-w env/HERCULES.env. Add ocnanalecen and atmanlfv3inc to working copy of HERCULES.env. After this change all 54 test_gdasapp pass on Hercules.

Test project /work/noaa/da/rtreadon/git/global-workflow/jediinc2fv3/sorc/gdas.cd/build
      Start 1393: test_gdasapp_util_coding_norms
 1/54 Test #1393: test_gdasapp_util_coding_norms ........................   Passed    0.97 sec
      Start 1394: test_gdasapp_util_ioda_example
 2/54 Test #1394: test_gdasapp_util_ioda_example ........................   Passed    0.96 sec
      Start 1395: test_gdasapp_util_prepdata
 3/54 Test #1395: test_gdasapp_util_prepdata ............................   Passed    0.43 sec
      Start 1396: test_gdasapp_util_rads2ioda
 4/54 Test #1396: test_gdasapp_util_rads2ioda ...........................   Passed    0.15 sec
      Start 1397: test_gdasapp_util_ghrsst2ioda
 5/54 Test #1397: test_gdasapp_util_ghrsst2ioda .........................   Passed    0.10 sec
      Start 1398: test_gdasapp_util_smap2ioda
 6/54 Test #1398: test_gdasapp_util_smap2ioda ...........................   Passed    0.09 sec
      Start 1399: test_gdasapp_util_smos2ioda
 7/54 Test #1399: test_gdasapp_util_smos2ioda ...........................   Passed    0.10 sec
      Start 1400: test_gdasapp_util_viirsaod2ioda
 8/54 Test #1400: test_gdasapp_util_viirsaod2ioda .......................   Passed    0.17 sec
      Start 1401: test_gdasapp_util_icecamsr2ioda
 9/54 Test #1401: test_gdasapp_util_icecamsr2ioda .......................   Passed    0.11 sec
      Start 1739: test_gdasapp_check_python_norms
10/54 Test #1739: test_gdasapp_check_python_norms .......................   Passed    1.82 sec
      Start 1740: test_gdasapp_check_yaml_keys
11/54 Test #1740: test_gdasapp_check_yaml_keys ..........................   Passed    0.05 sec
      Start 1741: test_gdasapp_jedi_increment_to_fv3
12/54 Test #1741: test_gdasapp_jedi_increment_to_fv3 ....................   Passed    0.31 sec
      Start 1742: test_gdasapp_setup_cycled_exp
13/54 Test #1742: test_gdasapp_setup_cycled_exp .........................   Passed    0.70 sec
      Start 1743: test_gdasapp_fv3jedi_fv3inc
14/54 Test #1743: test_gdasapp_fv3jedi_fv3inc ...........................   Passed    7.08 sec
      Start 1744: test_gdasapp_convert_bufr_temp_dbuoy
15/54 Test #1744: test_gdasapp_convert_bufr_temp_dbuoy ..................   Passed    0.17 sec
      Start 1745: test_gdasapp_convert_bufr_salt_dbuoy
16/54 Test #1745: test_gdasapp_convert_bufr_salt_dbuoy ..................   Passed    0.16 sec
      Start 1746: test_gdasapp_convert_bufr_temp_mbuoyb
17/54 Test #1746: test_gdasapp_convert_bufr_temp_mbuoyb .................   Passed    0.16 sec
      Start 1747: test_gdasapp_convert_bufr_salt_mbuoyb
18/54 Test #1747: test_gdasapp_convert_bufr_salt_mbuoyb .................   Passed    0.16 sec
      Start 1748: test_gdasapp_convert_bufr_tesacprof
19/54 Test #1748: test_gdasapp_convert_bufr_tesacprof ...................   Passed    0.19 sec
      Start 1749: test_gdasapp_convert_bufr_trkobprof
20/54 Test #1749: test_gdasapp_convert_bufr_trkobprof ...................   Passed    0.16 sec
      Start 1750: test_gdasapp_convert_bufr_sfcships
21/54 Test #1750: test_gdasapp_convert_bufr_sfcships ....................   Passed    0.16 sec
      Start 1751: test_gdasapp_convert_bufr_sfcshipsu
22/54 Test #1751: test_gdasapp_convert_bufr_sfcshipsu ...................   Passed    0.18 sec
      Start 1752: test_gdasapp_soca_nsst_increment_to_mom6
23/54 Test #1752: test_gdasapp_soca_nsst_increment_to_mom6 ..............   Passed    1.07 sec
      Start 1753: test_gdasapp_soca_prep
24/54 Test #1753: test_gdasapp_soca_prep ................................   Passed    1.29 sec
      Start 1754: test_gdasapp_soca_run_clean
25/54 Test #1754: test_gdasapp_soca_run_clean ...........................   Passed    0.21 sec
      Start 1755: test_gdasapp_soca_setup_obsprep
26/54 Test #1755: test_gdasapp_soca_setup_obsprep .......................   Passed    7.23 sec
      Start 1756: test_gdasapp_soca_JGLOBAL_PREP_OCEAN_OBS
27/54 Test #1756: test_gdasapp_soca_JGLOBAL_PREP_OCEAN_OBS ..............   Passed   42.57 sec
      Start 1757: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_PREP
28/54 Test #1757: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_PREP ....   Passed   42.14 sec
      Start 1758: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_BMAT
29/54 Test #1758: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_BMAT ....   Passed   42.13 sec
      Start 1759: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_RUN
30/54 Test #1759: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_RUN .....   Passed   42.14 sec
      Start 1760: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_ECEN
31/54 Test #1760: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_ECEN ....   Passed  106.14 sec
      Start 1761: test_gdasapp_soca_copy_scratch
32/54 Test #1761: test_gdasapp_soca_copy_scratch ........................   Passed    0.32 sec
      Start 1762: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_CHKPT
33/54 Test #1762: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_CHKPT ...   Passed   42.13 sec
      Start 1763: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_POST
34/54 Test #1763: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_POST ....   Passed   42.14 sec
      Start 1764: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY
35/54 Test #1764: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY ....   Passed  170.16 sec
      Start 1765: test_gdasapp_soca_socahybridweights
36/54 Test #1765: test_gdasapp_soca_socahybridweights ...................   Passed   10.12 sec
      Start 1766: test_gdasapp_soca_incr_handler
37/54 Test #1766: test_gdasapp_soca_incr_handler ........................   Passed   10.10 sec
      Start 1767: test_gdasapp_soca_ens_handler
38/54 Test #1767: test_gdasapp_soca_ens_handler .........................   Passed   10.11 sec
      Start 1768: test_gdasapp_snow_create_ens
39/54 Test #1768: test_gdasapp_snow_create_ens ..........................   Passed    0.45 sec
      Start 1769: test_gdasapp_snow_imsproc
40/54 Test #1769: test_gdasapp_snow_imsproc .............................   Passed    1.75 sec
      Start 1770: test_gdasapp_snow_apply_jediincr
41/54 Test #1770: test_gdasapp_snow_apply_jediincr ......................   Passed    3.56 sec
      Start 1771: test_gdasapp_snow_letkfoi_snowda
42/54 Test #1771: test_gdasapp_snow_letkfoi_snowda ......................   Passed    8.31 sec
      Start 1772: test_gdasapp_convert_bufr_adpsfc_snow
43/54 Test #1772: test_gdasapp_convert_bufr_adpsfc_snow .................   Passed    2.25 sec
      Start 1773: test_gdasapp_convert_bufr_adpsfc
44/54 Test #1773: test_gdasapp_convert_bufr_adpsfc ......................   Passed    2.96 sec
      Start 1774: test_gdasapp_convert_gsi_satbias
45/54 Test #1774: test_gdasapp_convert_gsi_satbias ......................   Passed    1.08 sec
      Start 1775: test_gdasapp_setup_atm_cycled_exp
46/54 Test #1775: test_gdasapp_setup_atm_cycled_exp .....................   Passed    0.60 sec
      Start 1776: test_gdasapp_atm_jjob_var_init
47/54 Test #1776: test_gdasapp_atm_jjob_var_init ........................   Passed   44.01 sec
      Start 1777: test_gdasapp_atm_jjob_var_run
48/54 Test #1777: test_gdasapp_atm_jjob_var_run .........................   Passed  106.12 sec
      Start 1778: test_gdasapp_atm_jjob_var_inc
49/54 Test #1778: test_gdasapp_atm_jjob_var_inc .........................   Passed   42.12 sec
      Start 1779: test_gdasapp_atm_jjob_var_final
50/54 Test #1779: test_gdasapp_atm_jjob_var_final .......................   Passed   42.12 sec
      Start 1780: test_gdasapp_atm_jjob_ens_init
51/54 Test #1780: test_gdasapp_atm_jjob_ens_init ........................   Passed   43.90 sec
      Start 1781: test_gdasapp_atm_jjob_ens_run
52/54 Test #1781: test_gdasapp_atm_jjob_ens_run .........................   Passed  266.14 sec
      Start 1782: test_gdasapp_atm_jjob_ens_final
53/54 Test #1782: test_gdasapp_atm_jjob_ens_final .......................   Passed   74.13 sec
      Start 1783: test_gdasapp_aero_gen_3dvar_yaml
54/54 Test #1783: test_gdasapp_aero_gen_3dvar_yaml ......................   Passed    0.32 sec

100% tests passed, 0 tests failed out of 54

Label Time Summary:
gdas-utils    =   3.09 sec*proc (9 tests)
script        =   3.09 sec*proc (9 tests)

Total Test time (real) = 1224.53 sec

Hercules, like Hera, runs Rocky, specifically Rocky Linux 9.1 (Blue Onyx). Recall that modulefiles/GDAS/hera.intel.lua differs from the Orion and Hercules modules in that not all the same modules are loaded. Also Hera modulefile does not load a python virtual environment. Does this provide any clues as to why we experience ctest failures on Hera?

DavidNew-NOAA commented 3 months ago

@RussTreadon-NOAA OK, I figured it out. In the increment converter, I was indexing the height dimension of an array to an index larger than the size of that dimension. I naively copied some Vader code that uses Atlas fieldsets, but it was for a variable on half-levels, so it had to be indexed to nLevels + 1, but ordinary grid-centered variables should be indexed to nLevels. My latest commit fixes that, and now test_gdasapp_fv3jedi_fv3inc passes on Hera.

RussTreadon-NOAA commented 3 months ago

Thank you @DavidNew-NOAA for troubleshooting over the weekend. This is above and beyond effort.

I recompiled GDASApp with the updated utils/fv3jedi/fv3jedi_fv3inc.h. All 54 tests still pass on Hercules and Orion. 53 out of 54 tests pass on Hera. The only Hera failure is

98% tests passed, 1 tests failed out of 54

Label Time Summary:
gdas-utils    =   4.80 sec*proc (9 tests)
script        =   4.80 sec*proc (9 tests)

Total Test time (real) = 1017.26 sec

The following tests FAILED:
        1763 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY (Failed)

A check of JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY.out shows the failure to be due to

+ slurm_script[52]: set +u
+ slurm_script[53]: conda activate eva
/var/spool/slurmd/job58232992/slurm_script: line 53: conda: command not found
+ slurm_script[1]: postamble slurm_script 1713126685 127

This failure is not related to this PR.

RussTreadon-NOAA commented 3 months ago

FYI @guillaumevernieres and @AndrewEichmann-NOAA:

test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY fails on Hera. JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY.out contains

+ slurm_script[52]: set +u
+ slurm_script[53]: conda activate eva
/var/spool/slurmd/job58232992/slurm_script: line 53: conda: command not found
+ slurm_script[1]: postamble slurm_script 1713126685 127