Closed RussTreadon-NOAA closed 4 weeks ago
NOTE: This PR requires
develop
sorc/gdas.cd
submodule hashThis PR will be marked Ready for review once these tasks are completed.
@RussTreadon-NOAA As part of this PR, please also enable the CI tests that use the new JEDI-based GDAS that are currently disabled on wcoss. You can do this by removing wcoss2
from the skip_ci_on_hosts
section in the ci/cases/*/*.yaml
case files.
@WalterKolczynski-NOAA , I can remove wcoss2
from ci/cases/pr/C96C48_ufs_hybatmDA.yaml
since I tested this on Cactus. It works.
Are you asking that this PR remove the following occurrences of `wcoss2 in CI yamls?
ci/cases/pr/C48mx500_3DVarAOWCDA.yaml: - wcoss2
ci/cases/pr/C96C48_ufs_hybatmDA.yaml: - wcoss2
ci/cases/pr/C96_atmaerosnowDA.yaml: - wcoss2
ci/cases/pr/C96_atm3DVar.yaml: - wcoss2
ci/cases/pr/C48_S2SWA_gefs.yaml: - wcoss2
I do not plan on testing anything other than C96C48_ufs_hybatmDA.yaml
Added note: This PR will remain in draft mode until NCO installs bufr/12.0.1
in production. Once this is done, wcoss2.intel.lua
in GDASApp PR #1122 will be updated to use the official production installation of bufr/12.0.1
. After GDASApp PR #1122 is closed, the sorc/gdas.cd
hash in this PR will be updated.
@WalterKolczynski-NOAA , I can remove
wcoss2
fromci/cases/pr/C96C48_ufs_hybatmDA.yaml
since I tested this on Cactus. It works.Are you asking that this PR remove the following occurrences of `wcoss2 in CI yamls?
ci/cases/pr/C48mx500_3DVarAOWCDA.yaml: - wcoss2 ci/cases/pr/C96C48_ufs_hybatmDA.yaml: - wcoss2 ci/cases/pr/C96_atmaerosnowDA.yaml: - wcoss2 ci/cases/pr/C96_atm3DVar.yaml: - wcoss2 ci/cases/pr/C48_S2SWA_gefs.yaml: - wcoss2
I do not plan on testing anything other than
C96C48_ufs_hybatmDA.yaml
Added note: This PR will remain in draft mode until NCO installs
bufr/12.0.1
in production. Once this is done,wcoss2.intel.lua
in GDASApp PR #1122 will be updated to use the official production installation ofbufr/12.0.1
. After GDASApp PR #1122 is closed, thesorc/gdas.cd
hash in this PR will be updated.
Not all of them. The GEFS test has to remain off until the bash CI system supports dual build (GFS and GEFS use different UFS executables because of the wave grid option). I'm also not sure why the C96_atm3DVar test isn't on already, will check.
The other three should work as soon as gdas.cd can be built, AFAIK. If they doesn't work out-of-the-box, we can get you help or defers those.
I'll keep it simple at first an only activate C96C48_ufs_hybatmDA
on wcoss2
Not all of them. The GEFS test has to remain off until the bash CI system supports dual build (GFS and GEFS use different UFS executables because of the wave grid option). I'm also not sure why the C96_atm3DVar test isn't on already, will check.
The other three should work as soon as gdas.cd can be built, AFAIK. If they doesn't work out-of-the-box, we can get you help or defers those.
Oh, the C96_atm3DVar test is disable because we run the extended version instead.
Build RussTreadon-NOAA:feature/wcoss2_ufsda
at 10a2bc5d on Cactus. Run JEDI ATM CI. 20240224/00 gfs and gdas cycles run to completion. 20240224/00 enkf cycle fails in the final job because member analysis increment files are not found in the expected format.
gdas.cd @ 95218e7
includes changes related to g-w PR #2592. This g-w PR adds a new enkf analysis increment job. gdas.cd @ 95218e7
assumes member increments are created by this new g-w job.
g-w PR #2592 must be merged into develop
and RussTreadon-NOAA:feature/wcoss2_ufsda
updated in order for the enkf cycle to successfully complete.
Install RussTreadon-NOAA:feature/wcoss2_ufsda
at 7f7093f
on Cactus. Run JEDI ATM CI (C96C48_ufs_hybatmDA). All jobs from gfs, gdas, and enkfgdas cycles successfully ran to completion
russ.treadon@clogin04:/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/prtest> rocotostat -d prtest.db -w prtest.xml -c all -s
CYCLE STATE ACTIVATED DEACTIVATED
202402231800 Done May 29 2024 00:25:41 May 29 2024 00:40:16
202402240000 Done May 29 2024 00:25:41 May 29 2024 02:45:11
@DavidHuber-NOAA and @CatherineThomas-NOAA , if either of you have time would you review the changes in this PR?
This PR allows GDASApp to be built and run on WCOSS2. This capability is required for GFS v17.
While not impacted by this PR, also run GSI-based ATM CI (C96C48_hybatmDA) on Cactus. All jobs successfully run to completion.
russ.treadon@clogin04:/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/prtest_gsi> rocotostat -d prtest_gsi.db -w prtest_gsi.xml -c all -s
CYCLE STATE ACTIVATED DEACTIVATED
202112201800 Done May 29 2024 09:35:34 May 29 2024 09:50:13
202112210000 Done May 29 2024 09:35:34 May 29 2024 11:30:16
202112210600 Done May 29 2024 09:35:34 May 29 2024 11:30:16
Does this need to be updated with that GSI utils hash before testing?
I did not plan on updating the hash for sorc/gsi_utils.fd
in order to keep this PR focused on the stated focus of this PR - enable UFSDA to build and run on WCOSS2. The current sorc/gsi_utils.fd
hash works for GSI and JEDI based DA.
GSI-utils PR #44 removed the use of /apps/ops/para/libs
and updated the version for a few modules. JEDI and GSI based CI tests demonstrated that this PR did not alter cycled results. It does, however, bring the package into better compliance with NCO implementation standards (e.g., do not build apps with non-production modules).
Given this plus your question, @WalterKolczynski-NOAA, I'll go ahead and update the sorc/gsi_utils.fd
hash in this PR to d940406
Thank you @CatherineThomas-NOAA
CI Update on Wcoss2 at 05/30/24 02:52:08 PM
============================================
Cloning and Building global-workflow PR: 2620
with PID: 81085 on host: clogin01
Automated global-workflow Testing Results:
Machine: Wcoss2
Start: Thu May 30 15:00:16 UTC 2024 on clogin01
---------------------------------------------------
Build: Completed at 05/30/24 03:12:05 PM
Case setup: Completed for experiment C48_ATM_d6f6ae0c
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_d6f6ae0c
Case setup: Skipped for experiment C48_S2SWA_gefs_d6f6ae0c
Case setup: Completed for experiment C48_S2SW_d6f6ae0c
Case setup: Completed for experiment C96_atm3DVar_extended_d6f6ae0c
Case setup: Skipped for experiment C96_atm3DVar_d6f6ae0c
Case setup: Skipped for experiment C96_atmaerosnowDA_d6f6ae0c
Case setup: Completed for experiment C96C48_hybatmDA_d6f6ae0c
Case setup: Completed for experiment C96C48_ufs_hybatmDA_d6f6ae0c
Experiment C96C48_ufs_hybatmDA_d6f6ae0c FAIL on Wcoss2 at 05/30/24 03:48:40 PM
Error logs:
/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2620/RUNTESTS/COMROOT/C96C48_ufs_hybatmDA_d6f6ae0c/logs/2024022400/gdasprepatmiodaobs.log
/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2620/RUNTESTS/COMROOT/C96C48_ufs_hybatmDA_d6f6ae0c/logs/2024022400/gfsprepatmiodaobs.log
Follow link here to view the contents of the above file(s): (link)
Experiment C48_S2SW FAILED on Hera with error logs:
/scratch1/NCEPDEV/global/CI/2620/RUNTESTS/COMROOT/C48_S2SW_d6f6ae0c/logs/2021032312/gfswaveinit.log
Follow link here to view the contents of the above file(s): (link)
Experiment C96_atmaerosnowDA FAILED on Hera in
/scratch1/NCEPDEV/global/CI/2620/RUNTESTS/C96_atmaerosnowDA_d6f6ae0c
Experiment C96_atm3DVar FAILED on Hera in
/scratch1/NCEPDEV/global/CI/2620/RUNTESTS/C96_atm3DVar_d6f6ae0c
Experiment C48mx500_3DVarAOWCDA FAILED on Hera in
/scratch1/NCEPDEV/global/CI/2620/RUNTESTS/C48mx500_3DVarAOWCDA_d6f6ae0c
Experiment C96C48_hybatmDA FAILED on Hera in
/scratch1/NCEPDEV/global/CI/2620/RUNTESTS/C96C48_hybatmDA_d6f6ae0c
Experiment C48_S2SW FAILED on Hera in
/scratch1/NCEPDEV/global/CI/2620/RUNTESTS/C48_S2SW_d6f6ae0c
Experiment C48_ATM FAILED on Hera with error logs:
/scratch1/NCEPDEV/global/CI/2620/RUNTESTS/COMROOT/C48_ATM_d6f6ae0c/logs/2021032312/gfsfcst.log
Follow link here to view the contents of the above file(s): (link)
Experiment C48_S2SWA_gefs FAILED on Hera in
/scratch1/NCEPDEV/global/CI/2620/RUNTESTS/C48_S2SWA_gefs_d6f6ae0c
Experiment C48_ATM FAILED on Hera in
/scratch1/NCEPDEV/global/CI/2620/RUNTESTS/C48_ATM_d6f6ae0c
Experiment C48_S2SWA_gefs FAILED on Hercules with error logs:
/work2/noaa/stmp/CI/HERCULES/2620/RUNTESTS/COMROOT/C48_S2SWA_gefs_d6f6ae0c/logs/2021032312/atmos_prod_mem002_f066.log
Follow link here to view the contents of the above file(s): (link)
Experiment C48_S2SWA_gefs FAILED on Hercules in
/work2/noaa/stmp/CI/HERCULES/2620/RUNTESTS/C48_S2SWA_gefs_d6f6ae0c
Experiment C48_S2SW FAILED on Hera with error logs:
/scratch1/NCEPDEV/global/CI/2620/RUNTESTS/COMROOT/C48_S2SW_d6f6ae0c/logs/2021032312/gfswaveinit.log
Follow link here to view the contents of the above file(s): (link)
/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2620/RUNTESTS/COMROOT/C96C48_ufs_hybatmDA_d6f6ae0c/logs/2024022400/gdasprepatmiodaobs.log
contains the following error message
/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2620/global-workflow/jobs/JGLOBAL_ATM_PREP_IODA_OBS: line 21: /lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2620/global-workflow/ush/run_bufr2ioda.py: No such file or directory
run_bufr2ioda.py
exists in sorc/gdas.cd/ush/ioda/bufr2ioda/
russ.treadon@clogin03:/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2620/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda> ls -l run_bufr2ioda.py
-rwxr-xr-x 1 terry.mcguinness global 4682 May 30 14:54 run_bufr2ioda.py
Script sorc/link_workflow.sh
should link this script to ush/
via
#------------------------------
#--add GDASApp files
#------------------------------
if [[ -d "${HOMEgfs}/sorc/gdas.cd/build" ]]; then
cd "${HOMEgfs}/ush" || exit 1
${LINK_OR_COPY} "${HOMEgfs}/sorc/gdas.cd/ush/soca" .
${LINK_OR_COPY} "${HOMEgfs}/sorc/gdas.cd/ush/ufsda" .
${LINK_OR_COPY} "${HOMEgfs}/sorc/gdas.cd/ush/jediinc2fv3.py" .
${LINK_OR_COPY} "${HOMEgfs}/sorc/gdas.cd/ush/ioda/bufr2ioda/gen_bufr2ioda_json.py" .
${LINK_OR_COPY} "${HOMEgfs}/sorc/gdas.cd/ush/ioda/bufr2ioda/gen_bufr2ioda_yaml.py" .
${LINK_OR_COPY} "${HOMEgfs}/sorc/gdas.cd/ush/ioda/bufr2ioda/run_bufr2ioda.py" .
${LINK_OR_COPY} "${HOMEgfs}/sorc/gdas.cd/build/bin/imsfv3_scf2ioda.py" .
fi
Note that the creation of links is dependent upon the existence of gdas.cd/build
. A check of /lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2620/global-workflow/sorc/gdas.cd
does not show build
being present. A check of sorc/logs
does not include build_gdas.log
.
Was sorc/build_all.sh
executed with the -u
option to build GDASApp?
Note: Some of the JEDI components in GDASApp executes git-lfs
during the build process. While I think g-w build_gdas.sh
will run to completion without git-lfs
, the log file may contain errors.
According to account_params
all Hera stmp filesets are over quota
Hera(hfe03):/scratch1/NCEPDEV/da/role.jedipara$ date
Thu May 30 16:33:31 UTC 2024
Hera(hfe03):/scratch1/NCEPDEV/da/role.jedipara$ account_params |grep stmp
Project: stmp
Directory: /scratch1/NCEPDEV/stmp DiskInUse=726315 GB, Quota=700000 GB, Files=36307512, FileQUota=140000000
Directory: /scratch2/NCEPDEV/stmp DiskInUse=710260 GB, Quota=700000 GB, Files=34504518, FileQUota=140000000
Directory: /scratch2/NCEPDEV/stmp1 DiskInUse=710260 GB, Quota=700000 GB, Files=34504518, FileQUota=140000000
Directory: /scratch1/NCEPDEV/stmp2 DiskInUse=726315 GB, Quota=700000 GB, Files=36307514, FileQUota=140000000
Directory: /scratch2/NCEPDEV/stmp3 DiskInUse=710260 GB, Quota=700000 GB, Files=34504518, FileQUota=140000000
Directory: /scratch1/NCEPDEV/stmp4 DiskInUse=726315 GB, Quota=700000 GB, Files=36307514, FileQUota=140000000
Do the Hera CI tests use any stmp
directories?
@RussTreadon-NOAA CI does only in the fact that is where RUNDIRS is defined for experiments and I just mad sure ours was clean. I have STMP set at /scratch1/NCEPDEV/stmp2
for CI on Hera.
CI Passed Orion at
Built and ran in directory /work2/noaa/stmp/CI/ORION/2620
ci/scripts/clone-build_ci.sh
needs to be updated to build with UFSDA on WCOSS2. Then I think @TerrenceMcGuinness-NOAA will need to update the copy we use on Cactus to drive CI.
I have no idea what this Hercules failure is.
Look at /work2/noaa/stmp/CI/HERCULES/2620/RUNTESTS/COMROOT/C48_S2SWA_gefs_d6f6ae0c/logs/2021032312/atmos_prod_mem002_f066.log
. The error message is
+ exglobal_atmos_products.sh[113]: (( iproc == nproc ))
+ exglobal_atmos_products.sh[118]: wgrib2 tmpfile_f066 -for 1:0 -grib tmpfile_f066_1
*** FATAL ERROR: parse_loop: end < start 1:0 ***
+ exglobal_atmos_products.sh[1]: postamble exglobal_atmos_products.sh 1717085522 8
Can someone from the GEFS team help troubleshoot?
Spot check GSI-based DA job log files in /work2/noaa/stmp/CI/HERCULES/2620/RUNTESTS/COMROOT/C96C48_hybatmDA_d6f6ae0c/logs
. All the log files I checked finished with error code 0
. Seems this test was successful.
@WalterKolczynski-NOAA @RussTreadon-NOAA It looks like the 66-hour master GRIB2 file from member 2 was truncated. Below are the sizes of the master GRIB2 files:
9.2M /work2/noaa/stmp/CI/HERCULES/2620/RUNTESTS/COMROOT/C48_S2SWA_gefs_d6f6ae0c/gefs.20210323/12/mem002/model_data/atmos/master/gefs.t12z.master.grb2f000
9.5M /work2/noaa/stmp/CI/HERCULES/2620/RUNTESTS/COMROOT/C48_S2SWA_gefs_d6f6ae0c/gefs.20210323/12/mem002/model_data/atmos/master/gefs.t12z.master.grb2f006
9.5M /work2/noaa/stmp/CI/HERCULES/2620/RUNTESTS/COMROOT/C48_S2SWA_gefs_d6f6ae0c/gefs.20210323/12/mem002/model_data/atmos/master/gefs.t12z.master.grb2f012
9.4M /work2/noaa/stmp/CI/HERCULES/2620/RUNTESTS/COMROOT/C48_S2SWA_gefs_d6f6ae0c/gefs.20210323/12/mem002/model_data/atmos/master/gefs.t12z.master.grb2f018
9.4M /work2/noaa/stmp/CI/HERCULES/2620/RUNTESTS/COMROOT/C48_S2SWA_gefs_d6f6ae0c/gefs.20210323/12/mem002/model_data/atmos/master/gefs.t12z.master.grb2f024
9.4M /work2/noaa/stmp/CI/HERCULES/2620/RUNTESTS/COMROOT/C48_S2SWA_gefs_d6f6ae0c/gefs.20210323/12/mem002/model_data/atmos/master/gefs.t12z.master.grb2f030
9.5M /work2/noaa/stmp/CI/HERCULES/2620/RUNTESTS/COMROOT/C48_S2SWA_gefs_d6f6ae0c/gefs.20210323/12/mem002/model_data/atmos/master/gefs.t12z.master.grb2f036
9.5M /work2/noaa/stmp/CI/HERCULES/2620/RUNTESTS/COMROOT/C48_S2SWA_gefs_d6f6ae0c/gefs.20210323/12/mem002/model_data/atmos/master/gefs.t12z.master.grb2f042
9.5M /work2/noaa/stmp/CI/HERCULES/2620/RUNTESTS/COMROOT/C48_S2SWA_gefs_d6f6ae0c/gefs.20210323/12/mem002/model_data/atmos/master/gefs.t12z.master.grb2f048
9.4M /work2/noaa/stmp/CI/HERCULES/2620/RUNTESTS/COMROOT/C48_S2SWA_gefs_d6f6ae0c/gefs.20210323/12/mem002/model_data/atmos/master/gefs.t12z.master.grb2f054
9.5M /work2/noaa/stmp/CI/HERCULES/2620/RUNTESTS/COMROOT/C48_S2SWA_gefs_d6f6ae0c/gefs.20210323/12/mem002/model_data/atmos/master/gefs.t12z.master.grb2f060
3.2M /work2/noaa/stmp/CI/HERCULES/2620/RUNTESTS/COMROOT/C48_S2SWA_gefs_d6f6ae0c/gefs.20210323/12/mem002/model_data/atmos/master/gefs.t12z.master.grb2f066
9.4M /work2/noaa/stmp/CI/HERCULES/2620/RUNTESTS/COMROOT/C48_S2SWA_gefs_d6f6ae0c/gefs.20210323/12/mem002/model_data/atmos/master/gefs.t12z.master.grb2f072
9.4M /work2/noaa/stmp/CI/HERCULES/2620/RUNTESTS/COMROOT/C48_S2SWA_gefs_d6f6ae0c/gefs.20210323/12/mem002/model_data/atmos/master/gefs.t12z.master.grb2f078
9.5M /work2/noaa/stmp/CI/HERCULES/2620/RUNTESTS/COMROOT/C48_S2SWA_gefs_d6f6ae0c/gefs.20210323/12/mem002/model_data/atmos/master/gefs.t12z.master.grb2f084
9.4M /work2/noaa/stmp/CI/HERCULES/2620/RUNTESTS/COMROOT/C48_S2SWA_gefs_d6f6ae0c/gefs.20210323/12/mem002/model_data/atmos/master/gefs.t12z.master.grb2f090
9.4M /work2/noaa/stmp/CI/HERCULES/2620/RUNTESTS/COMROOT/C48_S2SWA_gefs_d6f6ae0c/gefs.20210323/12/mem002/model_data/atmos/master/gefs.t12z.master.grb2f096
9.4M /work2/noaa/stmp/CI/HERCULES/2620/RUNTESTS/COMROOT/C48_S2SWA_gefs_d6f6ae0c/gefs.20210323/12/mem002/model_data/atmos/master/gefs.t12z.master.grb2f102
9.5M /work2/noaa/stmp/CI/HERCULES/2620/RUNTESTS/COMROOT/C48_S2SWA_gefs_d6f6ae0c/gefs.20210323/12/mem002/model_data/atmos/master/gefs.t12z.master.grb2f108
9.5M /work2/noaa/stmp/CI/HERCULES/2620/RUNTESTS/COMROOT/C48_S2SWA_gefs_d6f6ae0c/gefs.20210323/12/mem002/model_data/atmos/master/gefs.t12z.master.grb2f114
9.5M /work2/noaa/stmp/CI/HERCULES/2620/RUNTESTS/COMROOT/C48_S2SWA_gefs_d6f6ae0c/gefs.20210323/12/mem002/model_data/atmos/master/gefs.t12z.master.grb2f120
Note that the 66-hour forecast is 3.2MB while all others are ~9.5MB in size. Looking through the forecast log /work2/noaa/stmp/CI/HERCULES/2620/RUNTESTS/COMROOT/C48_S2SWA_gefs_d6f6ae0c/logs/2021032312/fcst_mem002.log
, I do not see any suspicious messages surrounding the creation of this file.
The node that the forecast ran on (hercules-02-18) reported several failed jobs yesterday than surrounding nodes (16 failed on 02-18 vs an average of 4.5 on 4 randomly selected nodes on 02-) , suggesting it may be a node issue. I'll report this to RDHPCS and see if they notice/noticed any issues with that node.
Just want to note that I believe the Hercules failure here happened before stmp filled, so I think David is on the right track with a failing node.
Thank you @DavidHuber-NOAA for digging into the cause of the Hercules failure. Just curious: How did you know that more jobs than average failed on hercules-02-18? Is this information in a log file or obtained from a command we run?
@DavidHuber-NOAA . OK. I see in your RDHPCS email
sacct -a --start 053024 --end 053124 -o "JobID,JobName%60,State,NodeList%60" -N "hercules-02-18" | grep FAIL
That's quite a mouthful. I didn't know about this combination of options with this command.
@WalterKolczynski-NOAA , @RussTreadon-NOAA , I believe, this
ci/cases/pr/C48mx500_3DVarAOWCDA.yaml: - wcoss2
will not work until the bufr library is updated
@guillaumevernieres , which version of bufr
does ci/cases/pr/C48mx500_3DVarAOWCDA.yaml require
?
The gdas.cd
hash used in this PR loads bufr/12.0.1
on WCOSS2 - see modulefiles/GDAS/wcoss2.intel.lua
@guillaumevernieres , which version of
bufr
doesci/cases/pr/C48mx500_3DVarAOWCDA.yaml require
?The
gdas.cd
hash used in this PR loadsbufr/12.0.1
on WCOSS2 - see modulefiles/GDAS/wcoss2.intel.lua
Ha! never mind, I missed the memo again @RussTreadon-NOAA .
@WalterKolczynski-NOAA : Are we waiting for other g-w PRs to pass CI and be merged into develop
or is there something I need to do with this PR to move it forward?
@RussTreadon-NOAA @WalterKolczynski-NOAA Since Renn asked that we try again, I'd suggest that we wait to see how the archiving CI test goes. If it passes on Hercules, then I would suggest opening this one up again.
CI Update on Wcoss2 at 06/01/24 05:12:41 AM
============================================
Cloning and Building global-workflow PR: 2620
with PID: 34445 on host: clogin01
Automated global-workflow Testing Results:
Machine: Wcoss2
Start: Sat Jun 1 05:21:08 UTC 2024 on clogin01
---------------------------------------------------
Build: Completed at 06/01/24 05:32:43 AM
Case setup: Completed for experiment C48_ATM_5bc05547
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_5bc05547
Case setup: Skipped for experiment C48_S2SWA_gefs_5bc05547
Case setup: Completed for experiment C48_S2SW_5bc05547
Case setup: Completed for experiment C96_atm3DVar_extended_5bc05547
Case setup: Skipped for experiment C96_atm3DVar_5bc05547
Case setup: Skipped for experiment C96_atmaerosnowDA_5bc05547
Case setup: Completed for experiment C96C48_hybatmDA_5bc05547
Case setup: Completed for experiment C96C48_ufs_hybatmDA_5bc05547
Experiment C96C48_ufs_hybatmDA_5bc05547 FAIL on Wcoss2 at 06/01/24 06:18:28 AM
Error logs:
/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2620/RUNTESTS/COMROOT/C96C48_ufs_hybatmDA_5bc05547/logs/2024022400/gfsprepatmiodaobs.log
Follow link here to view the contents of the above file(s): (link)
Experiment C96C48_ufs_hybatmDA_5bc05547 FAIL on Wcoss2 at 06/01/24 06:18:28 AM
Error logs:
/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2620/RUNTESTS/COMROOT/C96C48_ufs_hybatmDA_5bc05547/logs/2024022400/gfsprepatmiodaobs.log
Follow link here to view the contents of the above file(s): (link)
@TerrenceMcGuinness-NOAA and @WalterKolczynski-NOAA
A check of gfsprepatmiodaobs.log
shows the same error as before. Script ush/run_bufr2ioda.py
can not be found. This file resides in sorc/gdas.cd
. link_workf.low.sh
links it to ush/
when gdas.cd/build
is present.
#------------------------------
#--add GDASApp files
#------------------------------
if [[ -d "${HOMEgfs}/sorc/gdas.cd/build" ]]; then
cd "${HOMEgfs}/ush" || exit 1
${LINK_OR_COPY} "${HOMEgfs}/sorc/gdas.cd/ush/soca" .
${LINK_OR_COPY} "${HOMEgfs}/sorc/gdas.cd/ush/ufsda" .
${LINK_OR_COPY} "${HOMEgfs}/sorc/gdas.cd/ush/jediinc2fv3.py" .
${LINK_OR_COPY} "${HOMEgfs}/sorc/gdas.cd/ush/ioda/bufr2ioda/gen_bufr2ioda_json.py" .
${LINK_OR_COPY} "${HOMEgfs}/sorc/gdas.cd/ush/ioda/bufr2ioda/gen_bufr2ioda_yaml.py" .
${LINK_OR_COPY} "${HOMEgfs}/sorc/gdas.cd/ush/ioda/bufr2ioda/run_bufr2ioda.py" .
${LINK_OR_COPY} "${HOMEgfs}/sorc/gdas.cd/build/bin/imsfv3_scf2ioda.py" .
fi
A check of sorc/gdas.cd
shows that the GDASApp was not built
russ.treadon@clogin01:/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2620/global-workflow/sorc/gdas.cd> ls
build.sh bundle ci CMakeLists.txt LICENSE mains modulefiles parm prototypes README.md scripts sorc test ush utils
A check of sorc/logs
shows that build_gdas.log
is not present
russ.treadon@clogin01:/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2620/global-workflow/sorc/logs> ls
build_gfs_utils.log build_gsi_monitor.log build_ufs.log build_upp.log
build_gsi_enkf.log build_gsi_utils.log build_ufs_utils.log build_ww3prepost.log
Before we run C96C48_ufs_hybatmDA.
, we must include the -u
option when executing sorc/build.sh
. Given that we are also exercising GSI-based DA, too, our build command should be build_all.sh -g -u
. I like to add -v
for verbose output but doing so is optional
A grep "build_all.sh"
in /lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2620/global-workflow
finds build_all.sh
commands in ./ci/cases/yamls/build.yaml
builds:
- gefs: './build_all.sh -kw'
- gfs: './build_all.sh -kgu'
The -u
option is present for gfs builds. I also see system: gfs
specified in C96C48_ufs_hybatmDA.yaml
russ.treadon@clogin01:/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2620/global-workflow/ci/cases/pr> grep system -r .
./C48mx500_3DVarAOWCDA.yaml: system: gfs
./C48_ATM.yaml: system: gfs
./C48_S2SW.yaml: system: gfs
./C96_atm3DVar_extended.yaml: system: gfs
./C96_atmaerosnowDA.yaml: system: gfs
./C96_atm3DVar.yaml: system: gfs
./C48_S2SWA_gefs.yaml: system: gefs
./C96C48_ufs_hybatmDA.yaml: system: gfs
./C96C48_hybatmDA.yaml: system: gfs
Thus, I can not explain why GDASApp was not built on Cactus.
CI Passed Hercules at
Built and ran in directory /work2/noaa/stmp/CI/HERCULES/2620
@TerrenceMcGuinness-NOAA and @WalterKolczynski-NOAA
git statusin
/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2620/global-workflow/sorc` returns
russ.treadon@clogin01:/lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2620/global-workflow/sorc> git status .
Error cleaning LFS object: open /lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2620/global-workflow/.git/modules/sorc/gdas.cd/modules/sorc/crtm/lfs/tmp/576068607: no such file or directory
error: external filter 'git-lfs filter-process' failed
fatal: test/testinput/single_profile.nc4: clean filter 'lfs' failed
fatal: 'git status --porcelain=2' failed in submodule sorc/crtm
fatal: 'git status --porcelain=2' failed in submodule sorc/gdas.cd
The GDASApp clone requires git-lfs
for JEDI components. Do the above errors result in automated CI abandoning the GDASApp build?
We have git-lfs/2.11.0
on WCOSS2. Loading it requires one of the following gcc
compilers be loaded.
gcc/10.3.0
gcc/11.2.0
gcc/12.1.0
My ~russ.treadon/.bashrc
contains
module load gcc/12.1.0 # gcc is required to load git-lfs
module load git-lfs/2.11.0
Also, my ~russ.treadon/.gitconfig
contains
[filter "lfs"]
clean = git-lfs clean -- %f
smudge = git-lfs smudge -- %f
process = git-lfs filter-process
required = true
Does the account under which automated CI runs on Cactus include the above? If not, should we add the above?
Description
This PR enables ufsda (
sorc/gdas.cd
) to be built and run on WCOSS2.Resolves #2602 Resolves #2579
Type of change
Change characteristics
How has this been tested?
Clone, build, and run C96C48_ufs_hybatmDA CI on WCOSS2 (Cactus)
Checklist