Closed RussTreadon-NOAA closed 4 months ago
Run ctests on Cactus with the following results
Test project /lfs/h2/emc/da/noscrub/emc.da/git/global-workflow/wcoss2/sorc/gdas.cd/build
Start 1337: test_gdasapp_util_coding_norms
1/47 Test #1337: test_gdasapp_util_coding_norms ........................ Passed 3.49 sec
Start 1338: test_gdasapp_util_ioda_example
2/47 Test #1338: test_gdasapp_util_ioda_example ........................ Passed 0.25 sec
Start 1339: test_gdasapp_util_prepdata
3/47 Test #1339: test_gdasapp_util_prepdata ............................ Passed 0.81 sec
Start 1340: test_gdasapp_util_rads2ioda
4/47 Test #1340: test_gdasapp_util_rads2ioda ........................... Passed 0.14 sec
Start 1341: test_gdasapp_util_ghrsst2ioda
5/47 Test #1341: test_gdasapp_util_ghrsst2ioda ......................... Passed 0.13 sec
Start 1342: test_gdasapp_util_smap2ioda
6/47 Test #1342: test_gdasapp_util_smap2ioda ........................... Passed 0.12 sec
Start 1343: test_gdasapp_util_smos2ioda
7/47 Test #1343: test_gdasapp_util_smos2ioda ........................... Passed 0.15 sec
Start 1344: test_gdasapp_util_viirsaod2ioda
8/47 Test #1344: test_gdasapp_util_viirsaod2ioda ....................... Passed 0.13 sec
Start 1345: test_gdasapp_util_icecamsr2ioda
9/47 Test #1345: test_gdasapp_util_icecamsr2ioda ....................... Passed 0.12 sec
Start 1682: test_gdasapp_check_python_norms
10/47 Test #1682: test_gdasapp_check_python_norms ....................... Passed 5.83 sec
Start 1683: test_gdasapp_check_yaml_keys
11/47 Test #1683: test_gdasapp_check_yaml_keys .......................... Passed 0.24 sec
Start 1684: test_gdasapp_jedi_increment_to_fv3
12/47 Test #1684: test_gdasapp_jedi_increment_to_fv3 .................... Passed 0.68 sec
Start 1685: test_gdasapp_setup_cycled_exp
13/47 Test #1685: test_gdasapp_setup_cycled_exp ......................... Passed 1.88 sec
Start 1686: test_gdasapp_fv3jedi_fv3inc
Could not find executable srun
Looked in the following places:
srun
srun
Release/srun
Release/srun
Debug/srun
Debug/srun
MinSizeRel/srun
MinSizeRel/srun
RelWithDebInfo/srun
RelWithDebInfo/srun
Deployment/srun
Deployment/srun
Development/srun
Development/srun
Unable to find executable: srun
14/47 Test #1686: test_gdasapp_fv3jedi_fv3inc ...........................***Not Run 0.00 sec
Start 1687: test_gdasapp_soca_nsst_increment_to_mom6
15/47 Test #1687: test_gdasapp_soca_nsst_increment_to_mom6 ..............***Failed 1.64 sec
Start 1688: test_gdasapp_soca_prep
16/47 Test #1688: test_gdasapp_soca_prep ................................ Passed 3.11 sec
Start 1689: test_gdasapp_soca_run_clean
17/47 Test #1689: test_gdasapp_soca_run_clean ........................... Passed 0.02 sec
Start 1690: test_gdasapp_soca_setup_obsprep
18/47 Test #1690: test_gdasapp_soca_setup_obsprep ....................... Passed 13.11 sec
Start 1691: test_gdasapp_soca_JGLOBAL_PREP_OCEAN_OBS
19/47 Test #1691: test_gdasapp_soca_JGLOBAL_PREP_OCEAN_OBS ..............***Failed 1.58 sec
Start 1692: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_PREP
20/47 Test #1692: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_PREP ....***Failed 0.20 sec
Start 1693: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_BMAT
21/47 Test #1693: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_BMAT ....***Failed 0.21 sec
Start 1694: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_RUN
22/47 Test #1694: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_RUN .....***Failed 0.23 sec
Start 1695: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_ECEN
23/47 Test #1695: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_ECEN ....***Failed 0.21 sec
Start 1696: test_gdasapp_soca_copy_scratch
24/47 Test #1696: test_gdasapp_soca_copy_scratch ........................***Failed 0.03 sec
Start 1697: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_CHKPT
25/47 Test #1697: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_CHKPT ...***Failed 0.20 sec
Start 1698: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_POST
26/47 Test #1698: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_POST ....***Failed 0.20 sec
Start 1699: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY
27/47 Test #1699: test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY ....***Failed 0.24 sec
Start 1700: test_gdasapp_soca_socahybridweights
28/47 Test #1700: test_gdasapp_soca_socahybridweights ...................***Failed 0.17 sec
Start 1701: test_gdasapp_soca_incr_handler
29/47 Test #1701: test_gdasapp_soca_incr_handler ........................***Failed 0.17 sec
Start 1702: test_gdasapp_soca_ens_handler
30/47 Test #1702: test_gdasapp_soca_ens_handler .........................***Failed 0.17 sec
Start 1703: test_gdasapp_snow_create_ens
31/47 Test #1703: test_gdasapp_snow_create_ens .......................... Passed 0.83 sec
Start 1704: test_gdasapp_snow_imsproc
32/47 Test #1704: test_gdasapp_snow_imsproc ............................. Passed 3.05 sec
Start 1705: test_gdasapp_snow_apply_jediincr
33/47 Test #1705: test_gdasapp_snow_apply_jediincr ......................***Failed 0.32 sec
Start 1706: test_gdasapp_snow_letkfoi_snowda
34/47 Test #1706: test_gdasapp_snow_letkfoi_snowda ......................***Failed 0.58 sec
Start 1707: test_gdasapp_convert_bufr_adpsfc_snow
35/47 Test #1707: test_gdasapp_convert_bufr_adpsfc_snow ................. Passed 2.50 sec
Start 1711: test_gdasapp_convert_bufr_adpsfc
36/47 Test #1711: test_gdasapp_convert_bufr_adpsfc ...................... Passed 4.04 sec
Start 1712: test_gdasapp_convert_gsi_satbias
37/47 Test #1712: test_gdasapp_convert_gsi_satbias ...................... Passed 2.71 sec
Start 1713: test_gdasapp_setup_atm_cycled_exp
38/47 Test #1713: test_gdasapp_setup_atm_cycled_exp ..................... Passed 2.48 sec
Start 1714: test_gdasapp_atm_jjob_var_init
39/47 Test #1714: test_gdasapp_atm_jjob_var_init ........................ Passed 31.94 sec
Start 1715: test_gdasapp_atm_jjob_var_run
40/47 Test #1715: test_gdasapp_atm_jjob_var_run .........................***Failed 6.43 sec
Start 1716: test_gdasapp_atm_jjob_var_inc
41/47 Test #1716: test_gdasapp_atm_jjob_var_inc .........................***Failed 9.48 sec
Start 1717: test_gdasapp_atm_jjob_var_final
42/47 Test #1717: test_gdasapp_atm_jjob_var_final .......................***Failed 6.11 sec
Start 1718: test_gdasapp_atm_jjob_ens_init
43/47 Test #1718: test_gdasapp_atm_jjob_ens_init ........................ Passed 27.69 sec
Start 1719: test_gdasapp_atm_jjob_ens_run
44/47 Test #1719: test_gdasapp_atm_jjob_ens_run .........................***Failed 0.06 sec
Start 1720: test_gdasapp_atm_jjob_ens_inc
45/47 Test #1720: test_gdasapp_atm_jjob_ens_inc .........................***Failed 0.06 sec
Start 1721: test_gdasapp_atm_jjob_ens_final
46/47 Test #1721: test_gdasapp_atm_jjob_ens_final .......................***Failed 8.86 sec
Start 1722: test_gdasapp_aero_gen_3dvar_yaml
47/47 Test #1722: test_gdasapp_aero_gen_3dvar_yaml ......................***Failed 0.15 sec
51% tests passed, 23 tests failed out of 47
Label Time Summary:
gdas-utils = 5.35 sec*proc (9 tests)
script = 5.35 sec*proc (9 tests)
Total Test time (real) = 148.51 sec
The following tests FAILED:
1686 - test_gdasapp_fv3jedi_fv3inc (Not Run)
1687 - test_gdasapp_soca_nsst_increment_to_mom6 (Failed)
1691 - test_gdasapp_soca_JGLOBAL_PREP_OCEAN_OBS (Failed)
1692 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_PREP (Failed)
1693 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_BMAT (Failed)
1694 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_RUN (Failed)
1695 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_ECEN (Failed)
1696 - test_gdasapp_soca_copy_scratch (Failed)
1697 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_CHKPT (Failed)
1698 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_POST (Failed)
1699 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_VRFY (Failed)
1700 - test_gdasapp_soca_socahybridweights (Failed)
1701 - test_gdasapp_soca_incr_handler (Failed)
1702 - test_gdasapp_soca_ens_handler (Failed)
1705 - test_gdasapp_snow_apply_jediincr (Failed)
1706 - test_gdasapp_snow_letkfoi_snowda (Failed)
1715 - test_gdasapp_atm_jjob_var_run (Failed)
1716 - test_gdasapp_atm_jjob_var_inc (Failed)
1717 - test_gdasapp_atm_jjob_var_final (Failed)
1719 - test_gdasapp_atm_jjob_ens_run (Failed)
1720 - test_gdasapp_atm_jjob_ens_inc (Failed)
1721 - test_gdasapp_atm_jjob_ens_final (Failed)
1722 - test_gdasapp_aero_gen_3dvar_yaml (Failed)
Errors while running CTest
test_gdasapp_fv3jedi_fv3inc
As indicted by the ctest output this test fails because srun
is hardwired in test/fv3jedi/CMakeLists.txt
add_test(NAME test_gdasapp_fv3jedi_fv3inc
COMMAND srun -n6 ${CMAKE_BINARY_DIR}/bin/fv3jedi_fv3inc.x ${PROJECT_BINARY_DIR}/test/fv3jedi/testinput/gdasapp_fv3jedi_fv3inc.yaml
WORKING_DIRECTORY ${PROJECT_BINARY_DIR}/test/fv3jedi)
WCOSS2 uses PBS, not SLURM.
test_gdasapp_soca_nsst_increment_to_mom6
Rerun this test with -VV
. The test fails because
1687: Traceback (most recent call last):
1687: File "/lfs/h2/emc/da/noscrub/emc.da/git/global-workflow/wcoss2/sorc/gdas.cd/bundle/gdas/ush/socaincr2mom6.py", line 8, in <module>
1687: import ufsda
1687: File "/lfs/h2/emc/da/noscrub/emc.da/git/global-workflow/wcoss2/sorc/gdas.cd/ush/ufsda/__init__.py", line 2, in <module>
1687: from .ufs_yaml import gen_yaml, parse_config
1687: File "/lfs/h2/emc/da/noscrub/emc.da/git/global-workflow/wcoss2/sorc/gdas.cd/ush/ufsda/ufs_yaml.py", line 3, in <module>
1687: from wxflow import YAMLFile, TemplateConstants, Template
1687: ModuleNotFoundError: No module named 'wxflow'
Notice that hera.intel.lua
includes
-- hack for wxflow
prepend_path("PYTHONPATH", "/scratch1/NCEPDEV/da/python/gdasapp/wxflow/20240307/src")
On Orion, pip list
includes
wxflow 0.1.0
after loading GDAS/orion.intel.lua
The current wcoss2.intel.lua
contains the hera.intel.lua
wxflow
hack. Obviously this won't work on WCOSS2. Do we need to install wxflow on Cactus or is it already available? If it is available, where is it? What do you think @CoryMartin-NOAA?
I think it gets cloned as part of the global workflow. Perhaps we can use that somehow?
test_gdasapp_snow_apply_jediincr, test_gdasapp_snow_letkfoi_snowda
Rerun these tests with -VV
. Output indicates that both jobs fail on Cactus because srun
is being executed. Script test/snow/apply_jedi_incr
contains
# (n=6) -> this is fixed, at one task per tile (with minor code change, could run on a single proc).
srun '--export=ALL' -n 6 ${EXECDIR}/apply_incr.exe ${WORKDIR}/apply_incr.log
Script test/snow/letkfoi_snowda.sh
contains
srun '--export=ALL' -n 6 ${EXECDIR}/${JEDI_EXEC} letkf_snow.yaml
These scripts need to be generalized to allow other workflow commands
I think it gets cloned as part of the global workflow. Perhaps we can use that somehow?
We could try but then we need to move this test inside the if (WORKFLOW_TESTS)
block for tests/soca/CMakeLists.txt
Or we can clone wxflow with gdasapp and use relative paths?
test_gdasapp_atm_jjob_var & test_gdasapp_atm_jjob_ens
The ATM var and ens suite of jobs fail because the submission scripts in test/atm/global-workflow
do not properly submit the jobs to run via PBS. WCOSS2 execution winds up in the else
block of each jjob_*sh
script. For example, jjob_var_run.sh
contains
# Execute j-job
if [[ $machine = 'HERA' ]]; then
sbatch --ntasks=6 --account=$ACCOUNT --qos=batch --time=00:10:00 --export=ALL --wait ${HOMEgfs}/jobs/JGLOBAL_ATM_ANALYSIS_VARIATIONAL
elif [[ $machine = 'ORION' || $machine = 'HERCULES' ]]; then
sbatch --ntasks=6 --account=$ACCOUNT --qos=batch --time=00:10:00 --export=ALL --wait ${HOMEgfs}/jobs/JGLOBAL_ATM_ANALYSIS_VARIATIONAL
else
${HOMEgfs}/jobs/JGLOBAL_ATM_ANALYSIS_VARIATIONAL
fi
An elif [[ $machine = 'WCOSS2' ]]; then
block needs to be added to each script.
test_gdasapp_aero_gen_3dvar_yaml
Add -VV
to ctest. Output shows that this job fails because wxflow
can not be found
1722: Test command: /lfs/h2/emc/da/noscrub/emc.da/git/global-workflow/wcoss2/sorc/gdas.cd/bundle/gdas/test/aero/genyaml_3dvar.sh "/lfs/h2/emc/da/noscrub/emc.da/git/global-workflow/wcoss2/sorc/gdas.cd/build/gdas" "/lfs/h2/emc/da/noscrub/emc.da/git/global-workflow/wcoss2/sorc/gdas.cd/bundle/gdas" "WORKING" "DIRECTORY" "/lfs/h2/emc/da/noscrub/emc.da/git/global-workflow/wcoss2/sorc/gdas.cd/build/gdas/test/testrun/"
1722: Test timeout computed to be: 1500
1722: Traceback (most recent call last):
1722: File "<stdin>", line 1, in <module>
1722: ModuleNotFoundError: No module named 'wxflow'
test_gdasapp_soca_socahybridweights, test_gdasapp_soca_incr_handler, test_gdasapp_soca_ens_handler
Rerun each test with -VV
. Each test fails when trying to execute sbatch
. For example, test_gdasapp_soca_ens_handler attempts
1702: Test command: /lfs/h2/emc/da/noscrub/emc.da/git/global-workflow/wcoss2/sorc/gdas.cd/bundle/gdas/ush/soca/run_jjobs.py "-y" "/lfs/h2/emc/da/noscrub/emc.da/git/global-workflow/wcoss2/sorc/gdas.cd/build/gdas/test/soca/gw/testrun/run_gdas_apps_ens_handler.yaml" "--skip" "--ctest" "True"
1702: Environment variables:
1702: PYTHONPATH=/lfs/h2/emc/da/noscrub/emc.da/git/global-workflow/wcoss2/sorc/gdas.cd/build/gdas/../lib/python3.8:/apps/ops/prod/nco/core/prod_util.v2.0.14/ush:/apps/prod/python-modules/3.8.6/intel/19.1.3.304/lib/python3.8/site-packages
1702: Test timeout computed to be: 1500
1702: {'machine': 'wcoss2', 'ctest command': {'executable': '/lfs/h2/emc/da/noscrub/emc.da/git/global-workflow/wcoss2/sorc/gdas.cd/build/gdas/../bin/gdas_ens_handler.x', 'yaml input': '/lfs/h2/emc/da/noscrub/emc.da/git/global-workflow/wcoss2/sorc/gdas.cd/bundle/gdas/test/soca/testinput/ens_handler.yaml'}, 'job options': {'account': 'da-cpu', 'qos': 'batch', 'output': 'ens_handler.out', 'nodes': 1, 'ntasks': 1, 'partition': None, 'time': '00:05:00'}}
1702: running sbatch --wait run_jjobs.sh ...
1702: Traceback (most recent call last):
1702: File "/lfs/h2/emc/da/noscrub/emc.da/git/global-workflow/wcoss2/sorc/gdas.cd/bundle/gdas/ush/soca/run_jjobs.py", line 309, in <module>
1702: main()
1702: File "/lfs/h2/emc/da/noscrub/emc.da/git/global-workflow/wcoss2/sorc/gdas.cd/bundle/gdas/ush/soca/run_jjobs.py", line 305, in main
1702: run_card.execute(submit=True)
1702: File "/lfs/h2/emc/da/noscrub/emc.da/git/global-workflow/wcoss2/sorc/gdas.cd/bundle/gdas/ush/soca/run_jjobs.py", line 260, in execute
1702: subprocess.check_output(["sbatch", "--wait", self.name])
1702: File "/apps/spack/python/3.8.6/intel/19.1.3.304/pjn2nzkjvqgmjw4hmyz43v5x4jbxjzpk/lib/python3.8/subprocess.py", line 411, in check_output
1702: return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
1702: File "/apps/spack/python/3.8.6/intel/19.1.3.304/pjn2nzkjvqgmjw4hmyz43v5x4jbxjzpk/lib/python3.8/subprocess.py", line 489, in run
1702: with Popen(*popenargs, **kwargs) as process:
1702: File "/apps/spack/python/3.8.6/intel/19.1.3.304/pjn2nzkjvqgmjw4hmyz43v5x4jbxjzpk/lib/python3.8/subprocess.py", line 854, in __init__
1702: self._execute_child(args, executable, preexec_fn, close_fds,
1702: File "/apps/spack/python/3.8.6/intel/19.1.3.304/pjn2nzkjvqgmjw4hmyz43v5x4jbxjzpk/lib/python3.8/subprocess.py", line 1702, in _execute_child
1702: raise child_exception_type(errno_num, err_msg, err_filename)
1702: FileNotFoundError: [Errno 2] No such file or directory: 'sbatch'
test_gdasapp_soca_JGLOBAL_PREP_OCEAN_OBS
Add -VV
to rerun. Job failed attempting to execute sbatch
1691: machine is wcoss2
1691: gPDY: 20180415
1691: gcyc: 06
1691: assim_freq: 6
1691: RUN: gdas
1691: running sbatch --wait run_jjobs.sh ...
1691: Traceback (most recent call last):
1691: File "/lfs/h2/emc/da/noscrub/emc.da/git/global-workflow/wcoss2/sorc/gdas.cd/bundle/gdas/ush/soca/run_jjobs.py", line 309, in <module>
1691: main()
1691: File "/lfs/h2/emc/da/noscrub/emc.da/git/global-workflow/wcoss2/sorc/gdas.cd/bundle/gdas/ush/soca/run_jjobs.py", line 305, in main
1691: run_card.execute(submit=True)
1691: File "/lfs/h2/emc/da/noscrub/emc.da/git/global-workflow/wcoss2/sorc/gdas.cd/bundle/gdas/ush/soca/run_jjobs.py", line 260, in execute
1691: subprocess.check_output(["sbatch", "--wait", self.name])
1691: File "/apps/spack/python/3.8.6/intel/19.1.3.304/pjn2nzkjvqgmjw4hmyz43v5x4jbxjzpk/lib/python3.8/subprocess.py", line 411, in check_output
1691: return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
1691: File "/apps/spack/python/3.8.6/intel/19.1.3.304/pjn2nzkjvqgmjw4hmyz43v5x4jbxjzpk/lib/python3.8/subprocess.py", line 489, in run
1691: with Popen(*popenargs, **kwargs) as process:
1691: File "/apps/spack/python/3.8.6/intel/19.1.3.304/pjn2nzkjvqgmjw4hmyz43v5x4jbxjzpk/lib/python3.8/subprocess.py", line 854, in __init__
1691: self._execute_child(args, executable, preexec_fn, close_fds,
1691: File "/apps/spack/python/3.8.6/intel/19.1.3.304/pjn2nzkjvqgmjw4hmyz43v5x4jbxjzpk/lib/python3.8/subprocess.py", line 1702, in _execute_child
1691: raise child_exception_type(errno_num, err_msg, err_filename)
1691: FileNotFoundError: [Errno 2] No such file or directory: 'sbatch'
Other test_gdasapp_soca_JGDAS
ctests may fail for the same reason. It's also possible that each successive job requires the previous job to have Passed. Thus, if one job fails all the remaining jobs in the chain will fail.
I propose to revise the scope of this PR to build and only run some ctests. New issues(s) and PR(s) can be opened to get Failed tests running on WCOSS2.
@CoryMartin-NOAA , modulefiles/EVA/wcoss2.lua
has been added. If you see problems, let me know and I'll fix 'em.
Thank you @CoryMartin-NOAA . Merging into develop
.
This PR includes changes which
test_gdasapp
ctests to run on WCOSS2Resolves #1111