NOAA-EMC / global-workflow

Global Superstructure/Workflow supporting the Global Forecast System (GFS)
https://global-workflow.readthedocs.io/en/latest
GNU Lesser General Public License v3.0
70 stars 162 forks source link

Update archive job to use COMIN/COMOUT #2668

Closed HenryRWinterbottom closed 2 weeks ago

HenryRWinterbottom commented 3 weeks ago

Description

NCO has requested that each COM variable specify whether it is an input or an output. This completes that process for the global-workflow archive task.

Refs #2451

Type of change

Change characteristics

How has this been tested?

Change characteristics

Checklist

HenryRWinterbottom commented 3 weeks ago

Thank you for the feedback, @DavidHuber-NOAA. I remain uncertain about the convention here. I'd like to wait on @WalterKolczynski-NOAA's feedback before I address this suggestion. If it is COMIN rather than COMOUT, it is a trivial fix. But, I just want to be sure that it complies with what NCO needs/wants here.

WalterKolczynski-NOAA commented 3 weeks ago

Thank you for the feedback, @DavidHuber-NOAA. I remain uncertain about the convention here. I'd like to wait on @WalterKolczynski-NOAA's feedback before I address this suggestion. If it is COMIN rather than COMOUT, it is a trivial fix. But, I just want to be sure that it complies with what NCO needs/wants here.

If you are reading pre-existing files, it is COMIN.

If you are writing files, it is COMOUT.

If it is doing both, talk to me.

HenryRWinterbottom commented 3 weeks ago

Thank you, @WalterKolczynski-NOAA. So, in this case, since we are only reading the files such that they are being written to the archive, COMIN is the attribute required as noted by @DavidHuber-NOAA. Is that correct?

DavidHuber-NOAA commented 3 weeks ago

@WalterKolczynski-NOAA Thanks for the clarification. There is one COM directory that is read and written to and that is COM_ATMOS_TRACK. It probably makes a difference that this job is not run by NCO. What do you think we should call it?

@HenryWinterbottom-NOAA Besides that, everything should be COMIN.

WalterKolczynski-NOAA commented 3 weeks ago

@WalterKolczynski-NOAA Thanks for the clarification. There is one COM directory that is read and written to and that is COM_ATMOS_TRACK. It probably makes a difference that this job is not run by NCO. What do you think we should call it?

@HenryWinterbottom-NOAA Besides that, everything should be COMIN.

The track directory is an odd duck all around right now we're going to have to do something with. For now, define both a COMIN_ATMOS_TRACK and a COMOUT_ATMOS_TRACK.

WalterKolczynski-NOAA commented 3 weeks ago

Thank you, @WalterKolczynski-NOAA. So, in this case, since we are only reading the files such that they are being written to the archive, COMIN is the attribute required as noted by @DavidHuber-NOAA. Is that correct?

Correct

aerorahul commented 2 weeks ago

@HenryWinterbottom-NOAA Can you please resolve the conflict in this PR. It should be relatively easy. Please let me know if you need assistance.

Also, cannot run the tests on WCOSS2 yet as the libraries have not been synced on dogwood since the production switch yesterday. There is a WCOSS2 helpdesk ticket open for it.

HenryRWinterbottom commented 2 weeks ago

Done.

HenryRWinterbottom commented 2 weeks ago

@aerorahul The C48_S2SW completed without issue related to the recent updates.

emcbot commented 2 weeks ago

Experiment C48mx500_3DVarAOWCDA FAILED on Hera with error logs:

/scratch1/NCEPDEV/global/CI/2668/RUNTESTS/COMROOT/C48mx500_3DVarAOWCDA_3381ec6b/logs/2021032418/gdasarch.log

Follow link here to view the contents of the above file(s): (link)

emcbot commented 2 weeks ago

Experiment C48mx500_3DVarAOWCDA FAILED on Hera in /scratch1/NCEPDEV/global/CI/2668/RUNTESTS/C48mx500_3DVarAOWCDA_3381ec6b

emcbot commented 2 weeks ago

Experiment C96C48_hybatmDA FAILED on Hera with error logs:

/scratch1/NCEPDEV/global/CI/2668/RUNTESTS/COMROOT/C96C48_hybatmDA_3381ec6b/logs/2021122100/enkfgdasearc00.log
/scratch1/NCEPDEV/global/CI/2668/RUNTESTS/COMROOT/C96C48_hybatmDA_3381ec6b/logs/2021122100/enkfgdasearc01.log

Follow link here to view the contents of the above file(s): (link)

emcbot commented 2 weeks ago

Experiment C96C48_hybatmDA FAILED on Hera in /scratch1/NCEPDEV/global/CI/2668/RUNTESTS/C96C48_hybatmDA_3381ec6b

DavidHuber-NOAA commented 2 weeks ago

@HenryWinterbottom-NOAA jobs/JGDAS_ENKF_ARCHIVE and scripts/exgdas_enkf_earc.py also need to be updated. I all of the COM variables in the following lines should be updated to COMIN:

https://github.com/HenryWinterbottom-NOAA/global-workflow/blob/dacf10ba5d93f3db518f7759c89c4f35989b7d5c/jobs/JGDAS_ENKF_ARCHIVE#L13-L15

https://github.com/HenryWinterbottom-NOAA/global-workflow/blob/dacf10ba5d93f3db518f7759c89c4f35989b7d5c/scripts/exgdas_enkf_earc.py#L39

This is the cause of the C96C48_hybatmDA CI failure.

HenryRWinterbottom commented 2 weeks ago

@aerorahul This is ready for CI/CD again.

Thank you, @DavidHuber-NOAA.

DavidHuber-NOAA commented 2 weeks ago

@HenryWinterbottom-NOAA I think the ENKF scripts still need to be updated. I couldn't put in suggestions and commit them since they weren't touched previously in your work.

HenryRWinterbottom commented 2 weeks ago

@DavidHuber-NOAA Thank you. I missed those before I sent the last comment. Should be ready now.

aerorahul commented 2 weeks ago

@HenryWinterbottom-NOAA The templates do not have COMIN names.

HenryRWinterbottom commented 2 weeks ago

@HenryWinterbottom-NOAA The templates do not have COMIN names.

Thank you. Fixed.

DavidHuber-NOAA commented 2 weeks ago

Starting CI on WCOSS2.

emcbot commented 2 weeks ago

CI Update on Wcoss2 at 06/17/24 03:33:12 PM
============================================
Cloning and Building global-workflow PR: 2668
with PID: 127217 on host: dlogin08
emcbot commented 2 weeks ago

Failed on cloning and building global-workflowi PR: 2668 CI on Wcoss2 failed to build on Mon Jun 17 15:33:26 UTC 2024 for repo git@github.com:NOAA-EMC/global-workflow.git

DavidHuber-NOAA commented 2 weeks ago

@TerrenceMcGuinness-NOAA WCOSS2 failed to clone the branch. Does the controller on Dogwood need to be updated/installed?

emcbot commented 2 weeks ago

@DavidHuber-NOAA WCOSS uses BASH there is no controller only cron, looking it to why this failed now ...

terry.mcguinness (dlogin04) sorc (develop) $ ./build_all.sh -guk
Creating logs folder
Creating /lfs/h2/emc/global/noscrub/globalworkflow.ci/GFS_CI_ROOT/PR/2668/global-workflow/exec folder
Resetting modules to system default. Reseting $MODULEPATH back to system default. All extra directories will be removed from $MODULEPATH.
Building gsi_enkf, ufs, gfs_utils, gdas, ww3prepost, ufs_utils, gsi_utils, gsi_monitor, upp
Starting build_gsi_enkf.sh
Starting build_ufs.sh
Starting build_gfs_utils.sh
Starting build_gdas.sh
build_gsi_enkf.sh failed!  Exiting!
Check logs/build_gsi_enkf.log for details.
terry.mcguinness (dlogin04) sorc (develop) $ 

oops, not sure why this happened, still looking ..

terry.mcguinness (dlogin04) logs (develop) $ cat build_gsi_enkf.log 
+ OPTIND=1
+ getopts :j:dv option
+ case "${option}" in
+ BUILD_JOBS=8
+ getopts :j:dv option
+ shift 2
+ BUILD_TYPE=Release
+ BUILD_VERBOSE=NO
+ BUILD_JOBS=8
+ GSI_MODE=GFS
+ ENKF_MODE=GFS
+ REGRESSION_TESTS=NO
+ ./gsi_enkf.fd/ush/build.sh
./build_gsi_enkf.sh: line 22: ./gsi_enkf.fd/ush/build.sh: No such file or directory

Some how submodule checkout failed ... Not sure what happened, so far everything checks out ... (pun intended)

terry.mcguinness (dlogin04) global-workflow (develop) $ "${GH}" pr checkout "${PR}" --repo "${REPO_URL}" --recurse-submodules
remote: Enumerating objects: 537, done.
remote: Counting objects: 100% (291/291), done.
remote: Compressing objects: 100% (61/61), done.
remote: Total 211 (delta 169), reused 179 (delta 142), pack-reused 0
Receiving objects: 100% (211/211), 32.69 KiB | 3.63 MiB/s, done.
Resolving deltas: 100% (169/169), completed with 43 local objects.
From github.com:NOAA-EMC/global-workflow
 * [new ref]           refs/pull/2668/head -> feature/gwdev_issue_2451.002

Resetting Label and sending the cron bashed driver scripts to stdout.

emcbot commented 2 weeks ago

CI Update on Wcoss2 at 06/17/24 05:48:40 PM
============================================
Cloning and Building global-workflow PR: 2668
with PID: 74284 on host: dlogin08
emcbot commented 2 weeks ago

Automated global-workflow Testing Results:


Machine: Wcoss2
Start: Mon Jun 17 17:52:48 UTC 2024 on dlogin08
---------------------------------------------------
Build: Completed at 06/17/24 06:40:40 PM
Case setup: Completed for experiment C48_ATM_c9396e24
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_c9396e24
Case setup: Skipped for experiment C48_S2SWA_gefs_c9396e24
Case setup: Completed for experiment C48_S2SW_c9396e24
Case setup: Completed for experiment C96_atm3DVar_extended_c9396e24
Case setup: Skipped for experiment C96_atm3DVar_c9396e24
Case setup: Skipped for experiment C96_atmaerosnowDA_c9396e24
Case setup: Completed for experiment C96C48_hybatmDA_c9396e24
Case setup: Completed for experiment C96C48_ufs_hybatmDA_c9396e24
emcbot commented 2 weeks ago

Experiment C48_ATM_c9396e24 SUCCESS on Wcoss2 at 06/17/24 07:52:10 PM

emcbot commented 2 weeks ago

Experiment C48_S2SW_c9396e24 SUCCESS on Wcoss2 at 06/17/24 08:24:15 PM

emcbot commented 2 weeks ago

Experiment C96C48_hybatmDA_c9396e24 SUCCESS on Wcoss2 at 06/17/24 09:12:17 PM

emcbot commented 2 weeks ago

Experiment C96C48_ufs_hybatmDA_c9396e24 SUCCESS on Wcoss2 at 06/17/24 09:16:12 PM

emcbot commented 2 weeks ago

Experiment C96_atm3DVar_extended_c9396e24 SUCCESS on Wcoss2 at 06/18/24 02:52:37 AM

emcbot commented 2 weeks ago

All CI Test Cases Passed on Wcoss2:


Experiment C48_ATM_c9396e24 *** SUCCESS *** at 06/17/24 07:52:10 PM
Experiment C48_S2SW_c9396e24 *** SUCCESS *** at 06/17/24 08:24:15 PM
Experiment C96C48_hybatmDA_c9396e24 *** SUCCESS *** at 06/17/24 09:12:17 PM
Experiment C96C48_ufs_hybatmDA_c9396e24 *** SUCCESS *** at 06/17/24 09:16:12 PM
Experiment C96_atm3DVar_extended_c9396e24 *** SUCCESS *** at 06/18/24 02:52:37 AM
emcbot commented 2 weeks ago

CI Passed Hercules at
Built and ran in directory /work2/noaa/stmp/CI/HERCULES/2668

DavidHuber-NOAA commented 2 weeks ago

All tests passed on Hera. Marking as such.