Open RussTreadon-NOAA opened 1 week ago
GDASApp ctests and g-w CI testing
Install RussTreadon-NOAA:feature/rename_atm
at 39e719d
on Dogwood, Hera, Hercules, and Orion. Run GDASApp ctests and g-w C96C48_ufs_hybatmDA.
_Note: local modifications made to C96C48_ufshybatmDA to enable CI on Hera, Hercules, and Orion.
Dogwood (WCOSS2) Not all GDASApp ctests are functional on WCOSS2 due to SLURM assumptions. This is a known issue and will be addressed by future GDASApp issue(s) and PR(s).
54% tests passed, 21 tests failed out of 46
Label Time Summary:
gdas-utils = 6.73 sec*proc (9 tests)
script = 6.73 sec*proc (9 tests)
Total Test time (real) = 172.20 sec
The following tests FAILED:
1751 - test_gdasapp_fv3jedi_fv3inc (Not Run)
1756 - test_gdasapp_soca_JGLOBAL_PREP_OCEAN_OBS (Failed)
1757 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_PREP (Failed)
1758 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_BMAT (Failed)
1759 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_RUN (Failed)
1760 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_ECEN (Failed)
1761 - test_gdasapp_soca_copy_scratch (Failed)
1762 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_CHKPT (Failed)
1763 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_POST (Failed)
1764 - test_gdasapp_soca_socahybridweights (Failed)
1765 - test_gdasapp_soca_incr_handler (Failed)
1766 - test_gdasapp_soca_ens_handler (Failed)
1769 - test_gdasapp_snow_apply_jediincr (Failed)
1770 - test_gdasapp_snow_letkfoi_snowda (Failed)
1776 - test_gdasapp_atm_jjob_var_run (Failed)
1777 - test_gdasapp_atm_jjob_var_inc (Failed)
1778 - test_gdasapp_atm_jjob_var_final (Failed)
1780 - test_gdasapp_atm_jjob_ens_run (Failed)
1781 - test_gdasapp_atm_jjob_ens_inc (Failed)
1782 - test_gdasapp_atm_jjob_ens_final (Failed)
1783 - test_gdasapp_aero_gen_3dvar_yaml (Failed)
All jobs for g-w C96C48_ufs_hybatmDA CI successfully run to completion
russ.treadon@dlogin09:/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/pratm> rocotostat -d pratm.db -w pratm.xml -c all -s
CYCLE STATE ACTIVATED DEACTIVATED
202402231800 Done Jun 20 2024 10:25:07 Jun 20 2024 10:40:12
202402240000 Done Jun 20 2024 10:25:07 Jun 20 2024 12:42:23
Hera 48 out of 48 GDASApp ctests pass
Test project /scratch1/NCEPDEV/da/role.jedipara/git/global-workflow/rename_atm/sorc/gdas.cd/build
Start 1488: test_gdasapp_util_coding_norms
1/48 Test #1488: test_gdasapp_util_coding_norms ........................ Passed 2.47 sec
...
Start 1869: test_gdasapp_aero_gen_3dvar_yaml
48/48 Test #1869: test_gdasapp_aero_gen_3dvar_yaml ...................... Passed 0.81 sec
100% tests passed, 0 tests failed out of 48
Label Time Summary:
gdas-utils = 8.08 sec*proc (11 tests)
script = 8.08 sec*proc (11 tests)
Total Test time (real) = 2298.85 sec
All jobs for g-w C96C48_ufs_hybatmDA CI successfully run to completion
Hera(hfe08):/scratch1/NCEPDEV/stmp2/role.jedipara/EXPDIR/pratm$ rocotostat -d pratm.db -w pratm.xml -c all -s
CYCLE STATE ACTIVATED DEACTIVATED
202402231800 Done Jun 20 2024 09:50:11 Jun 20 2024 10:10:13
202402240000 Done Jun 20 2024 09:50:11 Jun 20 2024 12:50:12
Hercules
Initially 47 out of 48 tests passed. test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_ECEN
failed because APRUN_OCNANALECEN
was not defined. Examine g-w env/HERCULES.env
. Find that the ocnanalecen
section found in other machine env files was missing from HERCULES.env
. Add an ocnanalecen
section to HERCULES.env
@@ -135,6 +135,16 @@ case ${step} in
[[ ${NTHREADS_OCNANAL} -gt ${nth_max} ]] && export NTHREADS_OCNANAL=${nth_max}
export APRUN_OCNANAL="${launcher} -n ${npe_ocnanalrun} --cpus-per-task=${NTHREADS_OCNANAL}"
;;
+"ocnanalecen")
+
+ export APRUNCFP="${launcher} -n \$ncmd ${mpmd_opt}"
+
+ nth_max=$((npe_node_max / npe_node_ocnanalecen))
+
+ export NTHREADS_OCNANALECEN=${nth_ocnanalecen:-${nth_max}}
+ [[ ${NTHREADS_OCNANALECEN} -gt ${nth_max} ]] && export NTHREADS_OCNANALECEN=${nth_max}
+ export APRUN_OCNANALECEN="${launcher} -n ${npe_ocnanalecen} --cpus-per-task=${NTHREADS_OCNANALECEN}"
+;;
"ocnanalchkpt")
export APRUNCFP="${launcher} -n \$ncmd ${mpmd_opt}"
Rerun ctests. This time 48 out of 48 GDASApp ctests pass
Test project /work/noaa/da/rtreadon/git/global-workflow/rename_atm/sorc/gdas.cd/build
Start 1489: test_gdasapp_util_coding_norms
1/48 Test #1489: test_gdasapp_util_coding_norms ........................ Passed 1.76 sec
...
Start 1870: test_gdasapp_aero_gen_3dvar_yaml
48/48 Test #1870: test_gdasapp_aero_gen_3dvar_yaml ...................... Passed 0.38 sec
100% tests passed, 0 tests failed out of 48
Label Time Summary:
gdas-utils = 14.93 sec*proc (11 tests)
script = 14.93 sec*proc (11 tests)
Total Test time (real) = 1637.73 sec
All jobs for g-w C96C48_ufs_hybatmDA CI successfully run to completion
hercules-login-3:/work/noaa/stmp/rtreadon/EXPDIR/pratm$ rocotostat -d pratm.db -w pratm.xml -c all -s
CYCLE STATE ACTIVATED DEACTIVATED
202402231800 Done Jun 20 2024 10:00:03 Jun 20 2024 10:15:02
202402240000 Done Jun 20 2024 10:00:03 Jun 20 2024 12:05:03
Orion
Add changes needed to compile GDASApp on Orion following Rocky 9 upgrade (see GDASApp PR #1180). Also found it necessary to update g-w workflow/hosts.py
and ush/detect_machine.sh
(see g-w issue #2695). Updated working copies of these scripts accordingly. After this 48 out of 48 GDASApp ctests pass
Test project /work2/noaa/da/rtreadon/git/global-workflow/rename_atm/sorc/gdas.cd/build
Start 1489: test_gdasapp_util_coding_norms
1/48 Test #1489: test_gdasapp_util_coding_norms ........................ Passed 6.23 sec
...
Start 1870: test_gdasapp_aero_gen_3dvar_yaml
48/48 Test #1870: test_gdasapp_aero_gen_3dvar_yaml ...................... Passed 7.02 sec
100% tests passed, 0 tests failed out of 48
Label Time Summary:
gdas-utils = 39.30 sec*proc (11 tests)
script = 39.30 sec*proc (11 tests)
Total Test time (real) = 1715.61 sec
No attempt was made to run g-w C96C48_ufs_hybatmDA CI because g-w has not yet been updated to run on Orion following the Rocky 9 upgrade (see g-w issue #2694)
GDASApp and CI testing identified three issues with g-w files
envs/HERCULES.env
- missing ocnanalecen sectionush/detect_machine.sh
- path based check unable to distinguish between Hercules and Orionworkflow/hosts.py
- path based check unable to distinguish between Hercules and OrionRussTreadon-NOAA:feature/rename_atm
contains updates to these three files to address the above stated issues.
@CoryMartin-NOAA , @DavidNew-NOAA , @guillaumevernieres : this PR is ready for review. The PR
HERCULES.env
If any of you have time to review your review would be appreciated.
You don't want this to be merged in dev/gdasapp
@RussTreadon-NOAA ?
@guillaumevernieres : I thought @danholdaway 's schematic had us
dev/gdasapp
into a developer branchdevelop
develop
Once the g-w PR is closed, we rebase dev/gdasapp
. RussTreadon-NOAA:feature/rename_atm
followed this path.
If we want to merge RussTreadon-NOAA:feature/rename_atm
into dev/gdasapp
, the draft PR for doing so is #2702.
PR #2702 contains 112 modified files. This PR, #2700, contains 5 modified files.
@RussTreadon-NOAA Since #2654 also updates GDASApp hashes, would you be willing to work w/ @DavidNew-NOAA and merge the changes from this PR into #2654? This will expedite testing and merge.
@DavidNew-NOAA If @RussTreadon-NOAA agrees, would you be willing to merge #2700 into #2654 and do a test to confirm the changes are compatible?
Thanks!
@aerorahul Sure, but #2654 doesn't update any GDASApp hashes
@aerorahul and @DavidNew-NOAA : given your comments, I am cloning DavidNew-NOAA:feature/stage_from_yaml
on Hera and will run C96C48_ufs_hybatmDA CI. It's good to ensure PR #2654 works as intended before combining PR #2654 and #2700.
As documented in PR #2654, C96C48_ufs_hybatmDA CI is not working with DavidNew-NOAA:feature/stage_from_yaml
. I will work with @DavidNew-NOAA to figure out what's going on. If we can't get CI to work by tomorrow afternoon, I recommend moving forward with the PR, #2700, as is.
With the merger of PR #2654 into this PR, PR #2654 may be closed.
As noted in PR #2654, the changes below must be committed to GDASApp in order to fully exercise the capability added by PR #2654.
modulefiles/GDAS/gaea.intel.lua
- remove commented out wxflow hackmodulefiles/GDAS/hera.intel.lua
- remove wxflow hackmodulefiles/GDAS/noaacloud.intel.lua
- remove commented out wxflow hacktest/aero/genyaml_3dvar.sh
- add wxflow to PYTHONPATH@DavidNew-NOAA and @CoryMartin-NOAA : This PR is ready for final (I hope!) review.
This PR now includes the changes in g-w PR #2654. It also updates the gdas.cd
to the current (as of 6/21/2024) head of GDASApp develop
(4c58b1e
).
Thank you @CoryMartin-NOAA for quickly reviewing GDASApp PRs to move this along.
@RussTreadon-NOAA Would you be open to adding description from #2654 into this PR? I am happy to update it.
Thank you @aerorahul for your note. Yes, I would appreciate your updating the description of this PR with relevant content from PR #2654.
CI Update on Wcoss2 at 06/21/24 05:33:15 PM
============================================
Cloning and Building global-workflow PR: 2700
with PID: 250979 on host: dlogin08
Automated global-workflow Testing Results:
Machine: Wcoss2
Start: Fri Jun 21 17:37:44 UTC 2024 on dlogin08
---------------------------------------------------
Build: Completed at 06/21/24 06:14:55 PM
Case setup: Completed for experiment C48_ATM_7cc86d95
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_7cc86d95
Case setup: Skipped for experiment C48_S2SWA_gefs_7cc86d95
Case setup: Completed for experiment C48_S2SW_7cc86d95
Case setup: Completed for experiment C96_atm3DVar_extended_7cc86d95
Case setup: Skipped for experiment C96_atm3DVar_7cc86d95
Case setup: Skipped for experiment C96_atmaerosnowDA_7cc86d95
Case setup: Completed for experiment C96C48_hybatmDA_7cc86d95
Case setup: Completed for experiment C96C48_ufs_hybatmDA_7cc86d95
Experiment C48mx500_3DVarAOWCDA FAILED on Hera with error logs:
/scratch1/NCEPDEV/global/CI/2700/RUNTESTS/COMROOT/C48mx500_3DVarAOWCDA_7cc86d95/logs/2021032418/gdasprepoceanobs.log
Follow link here to view the contents of the above file(s): (link)
Experiment C48mx500_3DVarAOWCDA FAILED on Hera in
/scratch1/NCEPDEV/global/CI/2700/RUNTESTS/C48mx500_3DVarAOWCDA_7cc86d95
Experiment C48mx500_3DVarAOWCDA FAILED on Hera with error logs:
/scratch1/NCEPDEV/global/CI/2700/RUNTESTS/COMROOT/C48mx500_3DVarAOWCDA_7cc86d95/logs/2021032418/gdasprepoceanobs.log
Follow link here to view the contents of the above file(s): (link)
GDASApp issue #1192 has been opened to address this. The SOCA jobs are not finding wxflow
.
Experiment C96_atmaerosnowDA FAILED on Hera with error logs:
/scratch1/NCEPDEV/global/CI/2700/RUNTESTS/COMROOT/C96_atmaerosnowDA_7cc86d95/logs/2021122018/gdasprepsnowobs.log
Follow link here to view the contents of the above file(s): (link)
Experiment C96_atmaerosnowDA FAILED on Hera in
/scratch1/NCEPDEV/global/CI/2700/RUNTESTS/C96_atmaerosnowDA_7cc86d95
"AttributeError: 'SnowAnalysis' object has no attribute 'runtime_config'"
Experiment C48_ATM_7cc86d95 SUCCESS on Wcoss2 at 06/21/24 07:24:12 PM
Experiment C48_S2SW_7cc86d95 SUCCESS on Wcoss2 at 06/21/24 07:44:14 PM
This needs updated I think:
I do not see where or how runtime_config.cyc
gets associated with the SnowAnalysis
object
This needs updated I think: https://github.com/NOAA-EMC/global-workflow/blob/f43a86276aaef91efa28faadc71a3cf50e749efe/scripts/exglobal_prep_snow_obs.py#L24
I do not see where or how
runtime_config.cyc
gets associated with theSnowAnalysis
object
That should become task_config.cyc
.
Experiment C96C48_hybatmDA_7cc86d95 SUCCESS on Wcoss2 at 06/21/24 08:24:25 PM
Experiment C96C48_ufs_hybatmDA_7cc86d95 SUCCESS on Wcoss2 at 06/21/24 08:28:20 PM
Experiment C48mx500_3DVarAOWCDA FAILED on Hera with error logs:
/scratch1/NCEPDEV/global/CI/2700/RUNTESTS/COMROOT/C48mx500_3DVarAOWCDA_7cc86d95/logs/2021032418/gdasprepoceanobs.log
Follow link here to view the contents of the above file(s): (link)
GDASApp issue #1192 has been opened to address this. The SOCA jobs are not finding
wxflow
.
It's a bit more complicated that just wxflow
. Many soca scripts in gdas.cd/ush/soca
use runtime_config
. This is no longer valid. It should be task_config
.
Hera(hfe06):/scratch1/NCEPDEV/da/role.jedipara/git/global-workflow/pr2700/sorc/gdas.cd/ush$ grep -r runtime_config .
./soca/prep_ocean_obs.py: PDY = self.runtime_config['PDY']
./soca/prep_ocean_obs.py: cyc = self.runtime_config['cyc']
./soca/prep_ocean_obs.py: self.runtime_config['cdate'] = cdate
./soca/prep_ocean_obs.py: cdate = self.runtime_config['cdate']
./soca/prep_ocean_obs.py: RUN = self.runtime_config.RUN
./soca/prep_ocean_obs.py: cyc = self.runtime_config['cyc']
./soca/prep_ocean_obs.py: ocean_mask_dest = os.path.join(self.runtime_config.DATA, 'RECCAP2_region_masks_all_v20221025.nc')
./soca/prep_ocean_obs.py: self.runtime_config,
./soca/prep_ocean_obs.py: chdir(self.runtime_config.DATA)
./soca/prep_ocean_obs.py: RUN = self.runtime_config.RUN
./soca/prep_ocean_obs.py: cyc = self.runtime_config.cyc
Binary file ./soca/__pycache__/prep_ocean_obs.cpython-310.pyc matches
Binary file ./soca/__pycache__/prep_ocean_obs_utils.cpython-310.pyc matches
Binary file ./soca/__pycache__/marine_recenter.cpython-310.pyc matches
./soca/marine_recenter.py: PDY = self.runtime_config['PDY']
./soca/marine_recenter.py: cyc = self.runtime_config['cyc']
./soca/marine_recenter.py: DATA = self.runtime_config.DATA
./soca/marine_recenter.py: self.runtime_config['gcyc'] = gdate.strftime("%H")
./soca/marine_recenter.py: self.runtime_config['gPDY'] = datetime(gdate.year,
./soca/marine_recenter.py: 'dump': self.runtime_config.RUN,
./soca/marine_recenter.py: RUN = self.runtime_config.RUN
./soca/marine_recenter.py: gcyc = self.runtime_config.gcyc
./soca/marine_recenter.py: bkg_utils.stage_ic(self.config.bkg_dir, self.runtime_config.DATA, gcyc)
./soca/marine_recenter.py: gPDYstr = self.runtime_config.gPDY.strftime("%Y%m%d")
./soca/marine_recenter.py: chdir(self.runtime_config.DATA)
./soca/marine_recenter.py: RUN = self.runtime_config.RUN
./soca/marine_recenter.py: cyc = self.runtime_config.cyc
./soca/marine_recenter.py: PDYstr = self.runtime_config.PDY.strftime("%Y%m%d")
./soca/prep_ocean_obs_utils.py:def obs_fetch(config, runtime_config, obsprep_space, cycles):
./soca/prep_ocean_obs_utils.py: RUN = runtime_config.RUN
./soca/prep_ocean_obs_utils.py: PDY = runtime_config.PDY
./soca/prep_ocean_obs_utils.py: cyc = runtime_config.cyc
Hera(hfe06):/scratch1/NCEPDEV/da/role.jedipara/git/global-workflow/pr2700/sorc/gdas.cd/ush$ cd ..
Hera(hfe06):/scratch1/NCEPDEV/da/role.jedipara/git/global-workflow/pr2700/sorc/gdas.cd$ grep -r runtime_config ush/
ush/soca/prep_ocean_obs.py: PDY = self.runtime_config['PDY']
ush/soca/prep_ocean_obs.py: cyc = self.runtime_config['cyc']
ush/soca/prep_ocean_obs.py: self.runtime_config['cdate'] = cdate
ush/soca/prep_ocean_obs.py: cdate = self.runtime_config['cdate']
ush/soca/prep_ocean_obs.py: RUN = self.runtime_config.RUN
ush/soca/prep_ocean_obs.py: cyc = self.runtime_config['cyc']
ush/soca/prep_ocean_obs.py: ocean_mask_dest = os.path.join(self.runtime_config.DATA, 'RECCAP2_region_masks_all_v20221025.nc')
ush/soca/prep_ocean_obs.py: self.runtime_config,
ush/soca/prep_ocean_obs.py: chdir(self.runtime_config.DATA)
ush/soca/prep_ocean_obs.py: RUN = self.runtime_config.RUN
ush/soca/prep_ocean_obs.py: cyc = self.runtime_config.cyc
Binary file ush/soca/__pycache__/prep_ocean_obs.cpython-310.pyc matches
Binary file ush/soca/__pycache__/prep_ocean_obs_utils.cpython-310.pyc matches
Binary file ush/soca/__pycache__/marine_recenter.cpython-310.pyc matches
ush/soca/marine_recenter.py: PDY = self.runtime_config['PDY']
ush/soca/marine_recenter.py: cyc = self.runtime_config['cyc']
ush/soca/marine_recenter.py: DATA = self.runtime_config.DATA
ush/soca/marine_recenter.py: self.runtime_config['gcyc'] = gdate.strftime("%H")
ush/soca/marine_recenter.py: self.runtime_config['gPDY'] = datetime(gdate.year,
ush/soca/marine_recenter.py: 'dump': self.runtime_config.RUN,
ush/soca/marine_recenter.py: RUN = self.runtime_config.RUN
ush/soca/marine_recenter.py: gcyc = self.runtime_config.gcyc
ush/soca/marine_recenter.py: bkg_utils.stage_ic(self.config.bkg_dir, self.runtime_config.DATA, gcyc)
ush/soca/marine_recenter.py: gPDYstr = self.runtime_config.gPDY.strftime("%Y%m%d")
ush/soca/marine_recenter.py: chdir(self.runtime_config.DATA)
ush/soca/marine_recenter.py: RUN = self.runtime_config.RUN
ush/soca/marine_recenter.py: cyc = self.runtime_config.cyc
ush/soca/marine_recenter.py: PDYstr = self.runtime_config.PDY.strftime("%Y%m%d")
ush/soca/prep_ocean_obs_utils.py:def obs_fetch(config, runtime_config, obsprep_space, cycles):
ush/soca/prep_ocean_obs_utils.py: RUN = runtime_config.RUN
ush/soca/prep_ocean_obs_utils.py: PDY = runtime_config.PDY
ush/soca/prep_ocean_obs_utils.py: cyc = runtime_config.cyc
CI Passed Hercules at
Built and ran in directory /work2/noaa/stmp/CI/HERCULES/2700
Experiment C96_atm3DVar_extended_7cc86d95 SUCCESS on Wcoss2 at 06/22/24 03:56:30 AM
All CI Test Cases Passed on Wcoss2:
Experiment C48_ATM_7cc86d95 *** SUCCESS *** at 06/21/24 07:24:12 PM
Experiment C48_S2SW_7cc86d95 *** SUCCESS *** at 06/21/24 07:44:14 PM
Experiment C96C48_hybatmDA_7cc86d95 *** SUCCESS *** at 06/21/24 08:24:25 PM
Experiment C96C48_ufs_hybatmDA_7cc86d95 *** SUCCESS *** at 06/21/24 08:28:20 PM
Experiment C96_atm3DVar_extended_7cc86d95 *** SUCCESS *** at 06/22/24 03:56:30 AM
Updates to two files committed to RussTreadon-NOAA:feature/rename_atm
at 86631234
These changes along with changes documented in GDASApp PR #1195 to restore all GDASApp ctests to Passed state. These changes also get failed g-w CI for C96_atmaerosnowDA and C48mx500_3DVarAOWCDA past previously failed jobs.
NOTE: g-w CI should not be rerun until GDASApp PR #1195 is merged into GDASApp develop
and the sorc/gdas.cd
hash updated in RussTreadon-NOAA:feature/rename_atm
All jobs from g-w C48mx500_3DVarAOWCDA CI successfully ran to completion on Hera.
Hera(hfe06):/scratch1/NCEPDEV/stmp2/role.jedipara/EXPDIR/pr2700_wcda$ rocotostat -d pr2700_wcda.db -w pr2700_wcda.xml -c all -s
CYCLE STATE ACTIVATED DEACTIVATED
202103241200 Done Jun 22 2024 16:38:05 Jun 22 2024 17:50:15
202103241800 Done Jun 22 2024 16:38:05 Jun 22 2024 19:45:14
All jobs from g-w C96_atmaerosnowDA CI successfully ran to completion on Hera
Hera(hfe06):/scratch1/NCEPDEV/stmp2/role.jedipara/EXPDIR/pr2700_aero$ rocotostat -d pr2700_aero.db -w pr2700_aero.xml -c all -s
CYCLE STATE ACTIVATED DEACTIVATED
202112201200 Done Jun 22 2024 17:47:53 Jun 22 2024 18:15:11
202112201800 Done Jun 22 2024 17:47:53 Jun 22 2024 19:45:12
202112210000 Done Jun 22 2024 17:47:53 Jun 22 2024 21:35:10
All jobs from C96C48_ufs_hybatmDA CI successfully ran to completion on Hera
Hera(hfe09):/scratch1/NCEPDEV/stmp2/role.jedipara/EXPDIR/pr2700_ufsda$ rocotostat -d pr2700_ufsda.db -w pr2700_ufsda.xml -c all -s
CYCLE STATE ACTIVATED DEACTIVATED
202402231800 Done Jun 22 2024 21:15:22 Jun 22 2024 21:40:23
202402240000 Done Jun 22 2024 21:15:22 Jun 23 2024 02:05:12
All jobs from C96C48_hybatmDA CI successfully ran to completion on Hera
Hera(hfe09):/scratch1/NCEPDEV/stmp2/role.jedipara/EXPDIR/pr2700_gsida$ rocotostat -d pr2700_gsida.db -w pr2700_gsida.xml -c all -s
CYCLE STATE ACTIVATED DEACTIVATED
202112201800 Done Jun 22 2024 21:15:24 Jun 22 2024 21:40:25
202112210000 Done Jun 22 2024 21:15:24 Jun 23 2024 00:15:16
202112210600 Done Jun 22 2024 21:15:24 Jun 23 2024 01:00:23
@WalterKolczynski-NOAA , the gdas.cd
hash was updated at 19f35e9
. As reported above wcda, aerosnow, ufsda, and gsida CI successfully run on Hera.
However, you may opt to pause triggering new g-w CI until the team decides whether or not to include additional wxflow clean up in this PR.
@aerorahul, @WalterKolczynski-NOAA @CoryMartin-NOAA , & @DavidNew-NOAA , the changes in this PR may be reviewed.
I do not plan on making any more changes to this PR apart from
gdas.cd
hash once GDASApp PR #1197 is approved and merged into develop
The gdas.cd
hash has been updated. Absent change request(s) from reviewers, this PR is ready for final CI testing.
Experiment C48mx500_3DVarAOWCDA FAILED on Hera in
/scratch1/NCEPDEV/global/CI/2700/RUNTESTS/C48mx500_3DVarAOWCDA_d1d88a6d
/scratch1/NCEPDEV/global/glopara/dump/gdas.20210324/18/atmos/gdas.t18z.updated.status.tm00.bufr_d does not exist
Terry.McGuinness (hfe03) C48mx500_3DVarAOWCDA_d1d88a6d $ rocotocheck -w C48mx500_3DVarAOWCDA_d1d88a6d.xml -d C48mx500_3DVarAOWCDA_d1d88a6d.db -c 202103241800 -t gdasprep
Task: gdasprep
account: nems
command: /scratch1/NCEPDEV/global/CI/2700/gfs/jobs/rocoto/prep.sh
cores: 4
cycledefs: gdas
final: false
jobname: C48mx500_3DVarAOWCDA_d1d88a6d_gdasprep_18
join: /scratch1/NCEPDEV/global/CI/2700/RUNTESTS/COMROOT/C48mx500_3DVarAOWCDA_d1d88a6d/logs/2021032418/gdasprep.log
maxtries: 2
memory: 40GB
name: gdasprep
nodes: 2:ppn=2:tpp=1
partition: hera
queue: batch
throttle: 9999999
walltime: 00:30:00
environment
CDATE ==> 2021032418
CDUMP ==> gdas
COMROOT ==> /scratch1/NCEPDEV/global/CI/2700/RUNTESTS/COMROOT
DATAROOT ==> /scratch1/NCEPDEV/stmp2/Terry.McGuinness/RUNDIRS/C48mx500_3DVarAOWCDA_d1d88a6d
EXPDIR ==> /scratch1/NCEPDEV/global/CI/2700/RUNTESTS/EXPDIR/C48mx500_3DVarAOWCDA_d1d88a6d
HOMEgfs ==> /scratch1/NCEPDEV/global/CI/2700/gfs
NET ==> gfs
PDY ==> 20210324
RUN ==> gdas
RUN_ENVIR ==> emc
cyc ==> 18
dependencies
AND is not satisfied
SOME is satisfied
gdasatmos_prod_f000 of cycle 202103241200 is SUCCEEDED
gdasatmos_prod_f003 of cycle 202103241200 is SUCCEEDED
gdasatmos_prod_f006 of cycle 202103241200 is SUCCEEDED
gdasatmos_prod_f009 of cycle 202103241200 is SUCCEEDED
/scratch1/NCEPDEV/global/CI/2700/RUNTESTS/COMROOT/C48mx500_3DVarAOWCDA_d1d88a6d/gdas.20210324/12//model_data/atmos/history/gdas.t12z.atmf009.nc is available
/scratch1/NCEPDEV/global/glopara/dump/gdas.20210324/18/atmos/gdas.t18z.updated.status.tm00.bufr_d does not exist
Cycle: 202103241800
Valid for this task: YES
State: active
Activated: 2024-06-24 16:03:06 UTC
Completed: -
Expired: -
Job: This task has not been submitted for this cycle
Task can not be submitted because:
Dependencies are not satisfied
/scratch1/NCEPDEV/global/glopara/dump/gdas.20210324
was removed as part of routine GDA disk management.
@KateFriedman-NOAA , can this dump directory be restored on Hera to allow g-w C48mx500_3DVarAOWCDA CI to run?
The 20240224 00Z gdas and gfs atmanlupp jobs died on WCOSS2 (Dogwood) with the error message
+ JGLOBAL_ATMOS_UPP[22]: /lfs/h2/emc/da/noscrub/russ.treadon/git/global-workflow/rename_atm/scripts/exglobal_atmos_upp.py
Traceback (most recent call last):
File "/lfs/h2/emc/da/noscrub/russ.treadon/git/global-workflow/rename_atm/scripts/exglobal_atmos_upp.py", line 6, in <module>
from pygfs.task.upp import UPP
ModuleNotFoundError: No module named 'pygfs'
Examination of jobs/roccoto/upp.sh
shows the load_fv3gfs_modules.sh
is NOT executed on WCOSS2.
# Source FV3GFS workflow modules
#. "${HOMEgfs}/ush/load_fv3gfs_modules.sh"
#status=$?
#if (( status != 0 )); then exit "${status}"; fi
# Temporarily load modules from UPP on WCOSS2
source "${HOMEgfs}/ush/detect_machine.sh"
if [[ "${MACHINE_ID}" = "wcoss2" ]]; then
set +x
source "${HOMEgfs}/ush/module-setup.sh"
module use "${HOMEgfs}/sorc/ufs_model.fd/FV3/upp/modulefiles"
module load "${MACHINE_ID}"
module load prod_util
module load cray-pals
module load cfp
module load libjpeg
module load grib_util/1.2.3
module load wgrib2/2.0.8
export WGRIB2=wgrib2
module load python/3.8.6
module load crtm/2.4.0 # TODO: This is only needed when UPP_RUN=goes. Is there a better way to handle this?
set_trace
else
. "${HOMEgfs}/ush/load_fv3gfs_modules.sh"
status=$?
if (( status != 0 )); then exit "${status}"; fi
fi
Given this, add the following to the WCOSS2 section of upp.sh
@@ -29,6 +29,12 @@ if [[ "${MACHINE_ID}" = "wcoss2" ]]; then
module load python/3.8.6
module load crtm/2.4.0 # TODO: This is only needed when UPP_RUN=goes. Is there a better way to handle this?
set_trace
+
+ # Add wxflow to PYTHONPATH
+ wxflowPATH="${HOMEgfs}/ush/python"
+ PYTHONPATH="${PYTHONPATH:+${PYTHONPATH}:}${HOMEgfs}/ush:${wxflowPATH}"
+ export PYTHONPATH
+
else
. "${HOMEgfs}/ush/load_fv3gfs_modules.sh"
status=$?
With this change in place the gdas and gdas atmanlupp jobs ran to completion on WCOSS2.
Change committed to RussTreadon-NOAA:feature/rename_atm
at 8fc02e2
.
/scratch1/NCEPDEV/global/glopara/dump/gdas.20210324
was removed as part of routine GDA disk management.@KateFriedman-NOAA , can this dump directory be restored on Hera to allow g-w C48mx500_3DVarAOWCDA CI to run?
@RussTreadon-NOAA Sorry about that! The dump data for 20210324 has been filled back in on Hera. I have made a note to not remove this date in future age-offs now. Let me know if you have any issues with this dump data.
Experiment C96_atmaerosnowDA FAILED on Hera with error logs:
/scratch1/NCEPDEV/global/CI/2700/RUNTESTS/COMROOT/C96_atmaerosnowDA_d1d88a6d/logs/2021122100/gfsatmos_prod_f054.log
/scratch1/NCEPDEV/global/CI/2700/RUNTESTS/COMROOT/C96_atmaerosnowDA_d1d88a6d/logs/2021122100/gfsatmos_prod_f057.log
/scratch1/NCEPDEV/global/CI/2700/RUNTESTS/COMROOT/C96_atmaerosnowDA_d1d88a6d/logs/2021122100/gfsatmos_prod_f060.log
/scratch1/NCEPDEV/global/CI/2700/RUNTESTS/COMROOT/C96_atmaerosnowDA_d1d88a6d/logs/2021122100/gfsatmos_prod_f063.log
Follow link here to view the contents of the above file(s): (link)
Experiment C96_atmaerosnowDA FAILED on Hera in
/scratch1/NCEPDEV/global/CI/2700/RUNTESTS/C96_atmaerosnowDA_d1d88a6d
Experiment C96_atmaerosnowDA FAILED on Hera with error logs:
/scratch1/NCEPDEV/global/CI/2700/RUNTESTS/COMROOT/C96_atmaerosnowDA_d1d88a6d/logs/2021122100/gfsatmos_prod_f054.log /scratch1/NCEPDEV/global/CI/2700/RUNTESTS/COMROOT/C96_atmaerosnowDA_d1d88a6d/logs/2021122100/gfsatmos_prod_f057.log /scratch1/NCEPDEV/global/CI/2700/RUNTESTS/COMROOT/C96_atmaerosnowDA_d1d88a6d/logs/2021122100/gfsatmos_prod_f060.log /scratch1/NCEPDEV/global/CI/2700/RUNTESTS/COMROOT/C96_atmaerosnowDA_d1d88a6d/logs/2021122100/gfsatmos_prod_f063.log
Follow link here to view the contents of the above file(s): (link)
@WalterKolczynski-NOAA . Each of the cited log files contains a disk quota exceeded
message
/scratch1/NCEPDEV/global/CI/2700/RUNTESTS/COMROOT/C96_atmaerosnowDA_d1d88a6d/logs/2021122100/gfsatmos_prod_f054.log.0:cat: write error: Disk quota exceeded
/scratch1/NCEPDEV/global/CI/2700/RUNTESTS/COMROOT/C96_atmaerosnowDA_d1d88a6d/logs/2021122100/gfsatmos_prod_f057.log.0:cat: write error: Disk quota exceeded
/scratch1/NCEPDEV/global/CI/2700/RUNTESTS/COMROOT/C96_atmaerosnowDA_d1d88a6d/logs/2021122100/gfsatmos_prod_f060.log.0:cat: write error: Disk quota exceeded
/scratch1/NCEPDEV/global/CI/2700/RUNTESTS/COMROOT/C96_atmaerosnowDA_d1d88a6d/logs/2021122100/gfsatmos_prod_f063.log.0:cat: write error: Disk quota exceeded
@WalterKolczynski-NOAA : g-w C48mx500_3DVarAOWCDA CI successfully ran to completion on Hera during the morning of 6/25/2024 using the role.jedipara
account
Hera(hfe02):/scratch1/NCEPDEV/stmp2/role.jedipara/EXPDIR/pr2700_wcda$ rocotostat -d pr2700_wcda.db -w pr2700_wcda.xml -c all -s
CYCLE STATE ACTIVATED DEACTIVATED
202103241200 Done Jun 25 2024 09:45:12 Jun 25 2024 10:00:34
202103241800 Done Jun 25 2024 09:45:12 Jun 25 2024 10:46:18
Description
This PR updates the
gdas.cd
hash to bring in new JCB conventions. Resolves #2699From #2654 This PR will move much of the staging code that take place in the python initialization subroutines of the variational and ensemble DA jobs into Jinja2-templated YAML files to be passed into the wxflow file handler. Much of the staging has already been done this way, but this PR simply expands that strategy.
The old Python routines that were doing this staging are now removed. This is part of a broader refactoring of the pygfs tasking.
wxflow PR #30 is a companion to this PR.
Type of change
gdas.cd
hash)Change characteristics
How has this been tested?
Checklist