NOAA-EMC / global-workflow

Global Superstructure/Workflow supporting the Global Forecast System (GFS)
https://global-workflow.readthedocs.io/en/latest
GNU Lesser General Public License v3.0
70 stars 162 forks source link

Update gdas.cd and gsi_utils hashes #2641

Closed RussTreadon-NOAA closed 2 weeks ago

RussTreadon-NOAA commented 1 month ago

Description

This PR updates the sorc/gdas.cd and sorc/gsi_utils hashes. The updated hashes bring in bug fixes, new UFS DA functionality, and a Gaea build for gsi_utils.

Resolves #2640

Type of change

Change characteristics

How has this been tested?

Checklist

RussTreadon-NOAA commented 1 month ago

NOTE

This PR will remain in draft mode until the following PRs are merged into their respective develop

These PRs must be acted upon in sequence

The PR will be marked Ready for review once the above tasks are completed.

RussTreadon-NOAA commented 1 month ago

Orion test Install RussTreadon-NOAA:feature/update_gdasapp at 215cd32d. Enable and run C96C48_ufs_hybatmDA. All jobs successfully run to completion

Orion-login-4:/work2/noaa/stmp/rtreadon/EXPDIR/prsub$ rocotostat -d prsub.db -w prsub.xml -c all -s
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202402231800        Done    May 30 2024 12:13:17    May 30 2024 12:30:28
202402240000        Done    May 30 2024 12:13:17    May 30 2024 14:30:27
RussTreadon-NOAA commented 1 month ago

Hera test Install RussTreadon-NOAA:feature/update_gdasapp at 215cd32. Enable and run C96C48_ufs_hybatmDA. All jobs except one successfully run to completion. The single failure was

202402240000          enkfgdasearc00                    61083915                DEAD                   1         2          21.0

enkfgdasearc00.log contained the error message

  File "/scratch1/NCEPDEV/da/Russ.Treadon/git/global-workflow/update_gdasapp/ush/python/pygfs/task/archive.py", line 199, in _create_fileset
    raise FileNotFoundError(f"FATAL ERROR: Required file, directory, or glob {item} not found!")
FileNotFoundError: FATAL ERROR: Required file, directory, or glob logs/2024022400/enkfgdasatmensanlrun.log not found!

The atmensanlrun log file is no longer generated. With the merger of PR #2592 into g-w develop, job atmensanlrun was replaced by atmensanlletkf. A new job, atmensanlfv3inc was added in the same PR. See workflow/rocoto/gfs_tasks.py from PR #2592.

We need to update parm/archive/enkf.yaml.j2

-        {% set steps = ["atmensanlinit", "atmensanlrun", "atmensanlfinal"] %}
+        {% set steps = ["atmensanlinit", "atmensanlletkf", "atmensanlfv3inc", "atmensanlfinal"] %}

After this change was made in the Hera working copy of feature/update_gdasapp, job enkfgdasearc00 was rerun and successfully completed. All jobs executed by C96C48_ufs_hybatmDA successfully ran to completion.

/scratch1/NCEPDEV/stmp2/Russ.Treadon/EXPDIR/prtest$ rocotostat -d prtest.db -w prtest.xml -c all -s
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202402231800        Done    May 31 2024 13:19:32    May 31 2024 13:40:15
202402240000        Done    May 31 2024 13:19:32    May 31 2024 15:43:10
RussTreadon-NOAA commented 3 weeks ago

Orion and Hera tests

Install RussTreadon-NOAA:feature/update_gdasapp at 1dc1a11 on Orion and Hera. Two sets of tests run on each machine

  1. Run test_gdasapp ctests. All 47 ctests pass on both machines.

Orion

100% tests passed, 0 tests failed out of 47

Label Time Summary:
gdas-utils    =  12.48 sec*proc (9 tests)
script        =  12.48 sec*proc (9 tests)

Total Test time (real) = 3679.77 sec

Hera

100% tests passed, 0 tests failed out of 47

Label Time Summary:
gdas-utils    =   9.32 sec*proc (9 tests)
script        =   9.32 sec*proc (9 tests)

Total Test time (real) = 1745.66 sec
  1. Enable and run g-w CI for C96C48_ufs_hybatmDA. All jobs successfully run to completion on both machines

Orion

Orion-login-4:/work2/noaa/stmp/rtreadon/EXPDIR/pr2641$ rocotostat -d pr2641.db -w pr2641.xml -c all -s
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202402231800        Done    Jun 06 2024 01:33:40    Jun 06 2024 01:50:23
202402240000        Done    Jun 06 2024 01:33:40    Jun 06 2024 04:25:19

Hera

Hera(hfe07):/scratch1/NCEPDEV/stmp2/Russ.Treadon/EXPDIR/pr2641$ rocotostat -d pr2641.db -w pr2641.xml -c all -s
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202402231800        Done    Jun 06 2024 01:39:38    Jun 06 2024 04:25:12
202402240000        Done    Jun 06 2024 01:39:38    Jun 06 2024 07:30:14
RussTreadon-NOAA commented 3 weeks ago

Hercules and Cactus tests

Install RussTreadon-NOAA:feature/update_gdasapp at 1ca534c on Hercules and Cactus. Enable and run g-w CI for C96C48_ufs_hybatmDA. All jobs successfully run to completion on both machines

Hercules

(gdasapp) hercules-login-3:/work2/noaa/stmp/rtreadon/EXPDIR/pr2641_hercules$ rocotostat -d pr2641_hercules.db -w pr2641_hercules.xml -c all -s
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202402231800        Done    Jun 06 2024 16:15:10    Jun 06 2024 16:30:04
202402240000        Done    Jun 06 2024 16:15:10    Jun 06 2024 18:20:03

Cactus

russ.treadon@clogin07:/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/pr2641> rocotostat -d pr2641.db -w pr2641.xml -c all -s
   CYCLE         STATE           ACTIVATED              DEACTIVATED     
202402231800        Done    Jun 06 2024 16:12:12    Jun 06 2024 16:25:15
202402240000        Done    Jun 06 2024 16:12:12    Jun 06 2024 18:20:12
emcbot commented 3 weeks ago

Experiment C48mx500_3DVarAOWCDA FAILED on Hera with error logs:

/scratch1/NCEPDEV/global/CI/2641/RUNTESTS/COMROOT/C48mx500_3DVarAOWCDA_1ca534c7/logs/2021032418/gdasocnanalprep.log

Follow link here to view the contents of the above file(s): (link)

emcbot commented 3 weeks ago

Experiment C48mx500_3DVarAOWCDA FAILED on Hera in /scratch1/NCEPDEV/global/CI/2641/RUNTESTS/C48mx500_3DVarAOWCDA_1ca534c7

RussTreadon-NOAA commented 3 weeks ago

The log file found at (link) has an "unable to copy" error on line 1016 of the log file.

ls /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/72x35x25/soca/rossrad.nc on Hera confirms that the file does not exist. Below is ls -l of the directory in question

Hera(hfe10):~$ ls -l /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/72x35x25/soca/
total 92
drwxr-sr-x 2 Guillaume.Vernieres ocean  4096 Jul 23  2022 INPUT
-rw-r--r-x 1 Guillaume.Vernieres ocean 34295 Nov  2  2022 MOM_input
-rw-r--r-x 1 Guillaume.Vernieres ocean 39619 Jul 23  2022 MOM_input_bkp
lrwxrwxrwx 1 Guillaume.Vernieres ocean    50 Apr  4 20:21 RECCAP2_region_masks_all_v20221025.nc -> ../../common/RECCAP2_region_masks_all_v20221025.nc
drwxr-sr-x 5 Guillaume.Vernieres ocean  4096 Jun  5  2023 bkgerr
-rw-r--r-x 1 Guillaume.Vernieres ocean    20 Jul 21  2022 diag_table
-rw-r--r-x 1 Guillaume.Vernieres ocean   199 Jul 21  2022 field_table
lrwxrwxrwx 1 Guillaume.Vernieres ocean    33 Aug  8  2023 fields_metadata.yaml -> ../../common/fields_metadata.yaml
lrwxrwxrwx 1 Guillaume.Vernieres ocean    31 Aug  8  2023 godas_sst_bgerr.nc -> ../../common/godas_sst_bgerr.nc
lrwxrwxrwx 1 Guillaume.Vernieres ocean    43 Mar 21  2023 obsop_name_map.yaml -> ../../1440x1080x75/soca/obsop_name_map.yaml
lrwxrwxrwx 1 Guillaume.Vernieres ocean    24 Aug  8  2023 rossrad.dat -> ../../common/rossrad.dat

File rossrad.dat is present. File rossrad.nc is absent.

RussTreadon-NOAA commented 3 weeks ago

Notice in gdasocnanalprep.log that

+++ config.ocnanal[12]: export SOCA_INPUT_FIX_DIR=/scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/72x35x25/soca

The SOCA_INPUT_FIX_DIR setting for the failed CI comes from /ci/cases/gfsv17/ocnanal.yaml. This yaml sets the path to

ocnanal:
  SOCA_INPUT_FIX_DIR: /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/1440x1080x75/soca

This does not seem correct. It is a machine and user specific path. Other entries in ocnanal.yaml use {{ HOMEgfs }} as the path prefix.

Examination of the HOMEgfs for the failed C48mx500_3DVarAOWCDA test found file rossrad.nc in /scratch1/NCEPDEV/global/CI/2641/gfs/sorc/gdas.cd/sorc/soca/test/Data.

I'll stop here. What's the correct resolution of the above issue? Do we modify ci/cases/gfsv17/ocnanal.yaml to point at a directory in HOMEgfs?

What do you think @guillaumevernieres ?

RussTreadon-NOAA commented 3 weeks ago

@guillaumevernieres , can you do the following

  1. copy rossrad.nc into Hera /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/common
  2. soft link ../../comon/rossrad.nc to rossrad.nc in the following directories
    • /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/1440x1080x75/soca
    • /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/360x320x75/soca
    • /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/4500x3297x75/soca
    • /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/72x35x25/soca

Once this is done @WalterKolczynski-NOAA can rerun CI-Hera. Hopefully it will pass after rossrad.nc is added to soca fix directories.

guillaumevernieres commented 3 weeks ago

@guillaumevernieres , can you do the following

  1. copy rossrad.nc into Hera /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/common
  2. soft link ../../comon/rossrad.nc to rossrad.nc in the following directories

    • /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/1440x1080x75/soca
    • /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/360x320x75/soca
    • /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/4500x3297x75/soca
    • /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/72x35x25/soca

Once this is done @WalterKolczynski-NOAA can rerun CI-Hera. Hopefully it will pass after rossrad.nc is added to soca fix directories.

Done @RussTreadon-NOAA

RussTreadon-NOAA commented 3 weeks ago

Thank you @guillaumevernieres !

@WalterKolczynski-NOAA , when you have a chance please rerun CI-Hera. The file which caused gdasocnanalprep.log to abort is now present

Hera(hfe08):~$ ls -lL /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/72x35x25/soca/rossrad.nc
-rw-r--r-- 1 Guillaume.Vernieres ocean 517688 Jun  7 17:55 /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/72x35x25/soca/rossrad.nc

The Hera specific path in ci/cases/gfsv17/ocnanal.yaml can be addressed in a new g-w issue and PR.

RussTreadon-NOAA commented 3 weeks ago

Hera test

Install RussTreadon-NOAA:feature/update_gdasapp @ 54ead28 and run the following tests

  1. GDASApp ctests. 46 out of 46 tests pass
    
    100% tests passed, 0 tests failed out of 46

Label Time Summary: gdas-utils = 7.13 secproc (9 tests) script = 7.13 secproc (9 tests)

Total Test time (real) = 5321.27 sec


2. C96C48_ufs_hybatmDA CI.  All jobs successfully compelte

Hera(hfe11):/scratch1/NCEPDEV/stmp2/Russ.Treadon/EXPDIR/prtest$ rocotostat -d prtest.db -w prtest.xml -c all -s CYCLE STATE ACTIVATED DEACTIVATED 202402231800 Done Jun 07 2024 18:57:59 Jun 07 2024 19:30:16 202402240000 Done Jun 07 2024 18:57:59 Jun 08 2024 00:35:12


3. C48mx500_3DVarAOWCDA CI.  All jobs successfully complete except `gdasatmanlprod`.  A check of mpmd log files in `/scratch1/NCEPDEV/stmp2/Russ.Treadon/RUNDIRS/prwcda/atmos_products.915870` found the following error

21: + interp_atmos_master.sh38: for grid in "${grids[@]}" 21: + interp_atmos_master.sh39: gridopt=grid1p00 21: + interp_atmos_master.sh40: output_grids=' -new_grid latlon 0:1440:0.25 90:721:-0.25 pgb2file_anl_22_0p25 -new_grid latlon 0:720:0.5 90:361:-0. 5 pgb2file_anl_22_0p50 -new_grid latlon 0:360:1.0 90:181:-1.0 pgb2file_anl_22_1p00' 21: + interp_atmos_master.sh44: wgrib2 tmpfile_anl_22 -set_grib_type same -set_bitmap 1 -set_grib_max_bits 16 -new_grid_winds earth -new_grid_interpolation bilinear -if ':(CSNOW|CRAIN|CFRZR|CICEP|ICSEV):' -new_grid_interpolation neighbor -fi -if ':(APCP|ACPCP|PRATE|CPRAT|DZDT):' -new_grid_interpolation budget -fi -if ':(APCP|ACPCP|PRATE|CPRAT):' -set_grib_max_bits 25 -fi -new_grid latlon 0:1440:0.25 90:721:-0.25 pgb2file_anl_22_0p25 -new_grid latlon 0:720:0.5 90:361:-0.5 pgb2file_anl_22_0p50 -new_grid latlon 0:360:1.0 90:181:-1.0 pgb2file_anl_22_1p00 21: 21: FATAL ERROR: add_many_bitstream: n_bits = (26) 21: 21: 1:0:d=2021032418:MSLET:mean sea level:anl: 21: 2:27834:d=2021032418:HGT:1000 mb:anl: 21: 3:54708:d=2021032418:PRES:surface:anl: 21: 4:81277:d=2021032418:HGT:surface:anl: 21: + interp_atmos_master.sh1: postamble interp_atmos_master.sh 1717851052 8

Use `wgrib2` to examine `tmpfile_anl_22`.  The 5th record is PRATE

5:97347:vt=2021032418:surface:anl:PRATE Precipitation Rate [kg/m^2/s]: ndata=18432:undef=4148:mean=4.25403e+16:min=-6.08487e+19:max=5.02802e+20 grid_template=40:winds(N/S): Gaussian grid: (192 x 96) units 1e-06 input WE:NS output WE:SN number of latitudes between pole-equator=48 #points=18432 lat 88.572166 to -88.572166 lon 0.000000 to 358.125000 by 1.875000

The mean, min, and max values for PRATE are nonphysical.  

A check of `PRATE` in `gdas.t00z.master.grb2anl` from C96C48_ufs_hybatmDA CI finds

776:47835123:vt=2024022400:surface:anl:PRATE Precipitation Rate [kg/m^2/s]: ndata=73728:undef=0:mean=0.000167314:min=0:max=0.0338496 grid_template=40:winds(N/S): Gaussian grid: (384 x 192) units 1e-06 input WE:NS output WE:SN number of latitudes between pole-equator=96 #points=73728 lat 89.284225 to -89.284225 lon 0.000000 to 359.062500 by 0.937500


Not sure why the C48mx500_3DVarAOWCDA CI `gdas.t18z.master.grb2anl` has strange values for PRATE.

Tagging @guillaumevernieres , @JessicaMeixner-NOAA , and @CatherineThomas-NOAA 
RussTreadon-NOAA commented 3 weeks ago

Reran gdasfcst and gdasatmanlupp with KEEPDATA=YES. Examine log files and run directories. Looks like PRATE in gdas.t18z.master.grb2anl comes from tprcp in /scratch1/NCEPDEV/stmp2/Russ.Treadon/COMROOT/prwcda/gdas.20210324/18//analysis/atmos/gdas.t18z.sfcanl.nc A ncdump of tprcp from this file returns

 tprcp =
  Infinityf, 1.518987e-06, 1.518987e-06, Infinityf, -Infinityf,
    -3.376698e-05, Infinityf, -Infinityf, 1.518987e-06, 4.882991e+33,
    1.518987e-06, 1.518987e-06, 1.518987e-06, Infinityf, Infinityf,
    1.518987e-06, -Infinityf, Infinityf, 1.518987e-06, 1.518987e-06,
    1.518987e-06, 1.518987e-06, 1.518987e-06, -Infinityf, 1.242171e+26,
...
    6.212858e-08, 6.212858e-08, Infinityf, -Infinityf, 6.212858e-08,
    -Infinityf, -0.0005018776, 6.212858e-08, 6.212858e-08, 6.212858e-08,
    Infinityf, 6.212858e-08, 6.212858e-08, Infinityf, Infinityf, -Infinityf,
    -Infinityf, 6.212858e-08, 6.212858e-08, Infinityf ;

Why does the sfcanl file contain a total precipitation? What does this even mean? We don't analyze total precipitation.

Interestingly, a ncdump -v tprcp of the C96C48_ufs_hybatmDA /scratch1/NCEPDEV/stmp2/Russ.Treadon/COMROOT/prtest/gdas.20240224/00/analysis/atmos/gdas.t00z.sfcanl.nc yields physical values

 tprcp =
  1.299772e-06, 1.299772e-06, 1.299772e-06, 1.299772e-06, 1.299772e-06,
    1.299772e-06, 1.299772e-06, 1.299772e-06, 1.299772e-06, 1.299772e-06,
    1.299772e-06, 1.299772e-06, 1.299772e-06, 1.299772e-06, 1.299772e-06,
    1.299772e-06, 1.299772e-06, 1.299772e-06, 1.299772e-06, 1.299772e-06,
    1.299772e-06, 1.299772e-06, 1.299772e-06, 1.299772e-06, 1.299772e-06,
...
    1.776476e-06, 1.776476e-06, 1.776476e-06, 1.776476e-06, 1.776476e-06,
    1.776476e-06, 1.776476e-06, 1.776476e-06, 1.776476e-06, 2.057734e-06,
    2.057734e-06, 2.057734e-06, 2.057734e-06, 2.057734e-06, 2.057734e-06,
    2.057734e-06, 2.057734e-06, 2.057734e-06, 2.057734e-06 ;

I'm confused.

Is it possible that the observed behavior in C48mx500_3DVarAOWCDA is not related to the changes in this PR?

RussTreadon-NOAA commented 3 weeks ago

Repeat above test using g-w develop at 9caa51de. 20210324 18Z gdasatmanlprod failed in the same manner. PRATE has nonphysical values

5:97347:vt=2021032418:surface:anl:PRATE Precipitation Rate [kg/m^2/s]:
    ndata=18432:undef=4148:mean=4.25403e+16:min=-6.08487e+19:max=5.02802e+20
    grid_template=40:winds(N/S):
        Gaussian grid: (192 x 96) units 1e-06 input WE:NS output WE:SN
        number of latitudes between pole-equator=48 #points=18432
        lat 88.572166 to -88.572166
        lon 0.000000 to 358.125000 by 1.875000

The above generated via

Hera(hfe01):/scratch1/NCEPDEV/stmp2/Russ.Treadon/RUNDIRS/prwcda_dev/atmos_products.1374019$ wgrib2 -V tmpfile_anl_22 > temp.tmpfile_anl_22

The C48mx500_3DVarAOWCDA occurs in develop, not just this PR.

RussTreadon-NOAA commented 3 weeks ago

@CoryMartin-NOAA , @andytangborn , or @guillaumevernieres : this PR needs to be reviewed and approved. If any of you have time to review, your help would be much appreciated.

RussTreadon-NOAA commented 3 weeks ago

NOTES

  1. g-w PR #2620 brought the desired sorc/jcb hash into g-w develop. Therefore, this PR no longer needs to update sorc/jcb
  2. Gaea was added as a valid build target for gsi_utils.fd via GSI-utils PR #39. The sorc/gsi_utils.fd hash in feature/update_gdasapp is updated to bring this change into g-w `develop.
RussTreadon-NOAA commented 3 weeks ago

Thank you @guillaumevernieres

RussTreadon-NOAA commented 3 weeks ago

Thank you @CoryMartin-NOAA

emcbot commented 3 weeks ago

Experiment C96_atmaerosnowDA FAILED on Hera with error logs:

/scratch1/NCEPDEV/global/CI/2641/RUNTESTS/COMROOT/C96_atmaerosnowDA_4d58f647/logs/2021122018/gdasaeroanlinit.log

Follow link here to view the contents of the above file(s): (link)

emcbot commented 3 weeks ago

Experiment C96_atmaerosnowDA FAILED on Hera in /scratch1/NCEPDEV/global/CI/2641/RUNTESTS/C96_atmaerosnowDA_4d58f647

WalterKolczynski-NOAA commented 3 weeks ago
FileNotFoundError: [Errno 2] No such file or directory: '/scratch1/NCEPDEV/global/CI/2641/RUNTESTS/COMROOT/C96_atmaerosnowDA_4d58f647/gdas.20211220/18/obs/gdas.t18z.viirs_n21.2021122018.nc4'
RussTreadon-NOAA commented 3 weeks ago

C96_atmaerosnowDA failure

GDASApp hash c5ff4e7 added parm/aero/obs/config/viirs_n21_aod.yaml.j2 This observation source was also added to parm/aero/obs/lists/gdas_aero.yaml.j2. This is the list of aerosol observation types gdasaeroanlinit expects to find.

g-w CI C96_atmaerosnowDA cycles from 20211220 12Z to 20211221 00Z. The dump files used by the 20211220 18Z analysis are copied by gdasprep from /scratch1/NCEPDEV/global/glopara/dump/gdas.20211220/18/atmos/. This directory only contains two viirs dump files

era(hfe06):~$ ls -l /scratch1/NCEPDEV/global/glopara/dump/gdas.20211220/18/atmos/ | grep viirs
-rw-r--r-- 1 role.glopara global      967425 Feb 16 16:13 gdas.t18z.viirs_n20.2021122018.nc4
-rw-r--r-- 1 role.glopara global      967425 Feb 16 16:13 gdas.t18z.viirs_npp.2021122018.nc4

There is no gdas.t18z.viirs_n21.2021122018.nc4 file.

gdasprep links the viirs_n20 and virrs_npp files to /scratch1/NCEPDEV/global/CI/2641/RUNTESTS/COMROOT/C96_atmaerosnowDA_4d58f647/gdas.20211220/18/obs/

Hera(hfe06):~$ ls -l /scratch1/NCEPDEV/global/CI/2641/RUNTESTS/COMROOT/C96_atmaerosnowDA_4d58f647/gdas.20211220/18/obs/ |grep viirs
lrwxrwxrwx 1 Terry.McGuinness global         95 Jun 10 20:03 gdas.t18z.viirs_n20.2021122018.nc4 -> /scratch1/NCEPDEV/global/glopara/dump/gdas.20211220/18/atmos/gdas.t18z.viirs_n20.2021122018.nc4
lrwxrwxrwx 1 Terry.McGuinness global         95 Jun 10 20:03 gdas.t18z.viirs_npp.2021122018.nc4 -> /scratch1/NCEPDEV/global/glopara/dump/gdas.20211220/18/atmos/gdas.t18z.viirs_npp.2021122018.nc4

There is no viirs_n21 file.

However, GDASApp file parm/aero/obs/lists/gdas_aero.yaml.j2 lists viirs_n21 as an observation type to process. When gdasaeroanlinit runs, the copy for the viirs_n21 file fails, an unable to copy error message is generated, and the job aborts.

I do not think viirs_n21 data is available for 20211220 18Z. This is a problem.

What do you think @ypwang19 , @andytangborn , and @CoryMartin-NOAA ?

CoryMartin-NOAA commented 3 weeks ago

Sorry about that @RussTreadon-NOAA this was an oversight that would have been resolved when we switch to JCB for other components. For now, I'll comment out N21 from the list.

RussTreadon-NOAA commented 3 weeks ago

Thank you @CoryMartin-NOAA . I see GDASApp PR #1160 has been merged into GDASApp develop. Thank you @ypwang19 for your quick approval. Let me update the gdas.cd hash used in this PR and manually run CI.

RussTreadon-NOAA commented 2 weeks ago

HERA CI tests

Install RussTreadon-NOAA:feature/update_gdasapp at b54a4d8. Activate C96C48_ufs_hybatmDA. Run the following CI cases with the noted results

_C96atmaerosnowDA - all jobs successfully run

Hera(hfe08):/scratch1/NCEPDEV/stmp2/Russ.Treadon/EXPDIR/praerosnow$ rocotostat -d praerosnow.db -w praerosnow.xml -c all -s
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202112201200        Done    Jun 12 2024 01:40:17    Jun 12 2024 02:00:24
202112201800        Done    Jun 12 2024 01:40:17    Jun 12 2024 03:15:20
202112210000        Done    Jun 12 2024 01:40:17    Jun 12 2024 05:00:26

_C96C48hybatmDA - all jobs successfully run

Hera(hfe08):/scratch1/NCEPDEV/stmp2/Russ.Treadon/EXPDIR/prgsida$ rocotostat -d prgsida.db -w prgsida.xml -c all -s
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202112201800        Done    Jun 12 2024 01:40:13    Jun 12 2024 01:55:10
202112210000        Done    Jun 12 2024 01:40:13    Jun 12 2024 04:00:17
202112210600        Done    Jun 12 2024 01:40:13    Jun 12 2024 04:05:11

_C96C48_ufshybatmDA - all jobs successfully run

Hera(hfe08):/scratch1/NCEPDEV/stmp2/Russ.Treadon/EXPDIR/prufsda$ rocotostat -d prufsda.db -w prufsda.xml -c all -s
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202402231800        Done    Jun 12 2024 01:40:15    Jun 12 2024 01:55:12
202402240000        Done    Jun 12 2024 01:40:15    Jun 12 2024 04:05:12

_C48mx5003DVarAOWCDA.yaml - all jobs successfully run except 2021032418 00Z gdasatmanlprod. Due to the gdasatmanlprod failure downstream jobs gdasarch and gdascleanup did not run

Hera(hfe08):/scratch1/NCEPDEV/stmp2/Russ.Treadon/EXPDIR/praowcda$ rocotostat -d praowcda.db -w praowcda.xml -c all -s
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202103241200        Done    Jun 12 2024 01:40:19    Jun 12 2024 01:55:15
202103241800      Active    Jun 12 2024 01:40:19             -

Issue #2669 has been opened to report and discuss the C48mx500_3DVarAOWCDA failure.

@WalterKolczynski-NOAA : The results above indicate that this PR is once again ready for CI testing on Hera

@WalterKolczynski-NOAA , are there any other g-w CI cases I should run? I'm willing to do whatever I can to get this PR across the finish line.

TerrenceMcGuinness-NOAA commented 2 weeks ago

@WalterKolczynski-NOAA Unforeseen git submodule checkout fails are timing out in CI on Hera. I will restart this CI test currently being incorrectly reporting out as still building.

RussTreadon-NOAA commented 2 weeks ago

@TerrenceMcGuinness-NOAA : learned about the --jobs option from @guillaumevernieres.

As explained on git-scm.com the git clone command can clone submodules in parallel via the -j or --jobs option.

-j --jobs The number of submodules fetched at the same time. Defaults to the submodule.fetchJobs option.

Adding --jobs 8 to the clone could prove beneficial for g-w CI.

emcbot commented 2 weeks ago

CI Passed Hera at
Built and ran in directory /scratch1/NCEPDEV/global/CI/2641

emcbot commented 2 weeks ago

CI Update on Wcoss2 at 06/13/24 01:40:18 AM
============================================
Cloning and Building global-workflow PR: 2641
with PID: 96375 on host: clogin05
emcbot commented 2 weeks ago

Automated global-workflow Testing Results:


Machine: Wcoss2
Start: Thu Jun 13 01:51:19 UTC 2024 on clogin05
---------------------------------------------------
Build: Completed at 06/13/24 02:33:18 AM
Case setup: Completed for experiment C48_ATM_872f2f5f
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_872f2f5f
Case setup: Skipped for experiment C48_S2SWA_gefs_872f2f5f
Case setup: Completed for experiment C48_S2SW_872f2f5f
Case setup: Completed for experiment C96_atm3DVar_extended_872f2f5f
Case setup: Skipped for experiment C96_atm3DVar_872f2f5f
Case setup: Skipped for experiment C96_atmaerosnowDA_872f2f5f
Case setup: Completed for experiment C96C48_hybatmDA_872f2f5f
Case setup: Completed for experiment C96C48_ufs_hybatmDA_872f2f5f
emcbot commented 2 weeks ago

Experiment C48_ATM_872f2f5f SUCCESS on Wcoss2 at 06/13/24 03:42:42 AM

emcbot commented 2 weeks ago

Experiment C48_S2SW_872f2f5f SUCCESS on Wcoss2 at 06/13/24 04:00:18 AM

emcbot commented 2 weeks ago

Experiment C96C48_hybatmDA_872f2f5f SUCCESS on Wcoss2 at 06/13/24 04:48:20 AM

emcbot commented 2 weeks ago

Experiment C96C48_ufs_hybatmDA_872f2f5f SUCCESS on Wcoss2 at 06/13/24 04:57:15 AM

emcbot commented 2 weeks ago

Experiment C96_atm3DVar_extended_872f2f5f SUCCESS on Wcoss2 at 06/13/24 10:36:28 AM

emcbot commented 2 weeks ago

All CI Test Cases Passed on Wcoss2:


Experiment C48_S2SW_872f2f5f *** SUCCESS *** at 06/13/24 04:00:18 AM
Experiment C96C48_hybatmDA_872f2f5f *** SUCCESS *** at 06/13/24 04:48:20 AM
Experiment C96C48_ufs_hybatmDA_872f2f5f *** SUCCESS *** at 06/13/24 04:57:15 AM
Experiment C96_atm3DVar_extended_872f2f5f *** SUCCESS *** at 06/13/24 10:36:28 AM
RussTreadon-NOAA commented 2 weeks ago

@TerrenceMcGuinness-NOAA and @WalterKolczynski-NOAA , while I can log into Hercules, I can not cd /work2/noaa. The cd command hangs. I see the CI-Hercules-Ready label on this PR. Is g-w CI currently running on Hercules?

TerrenceMcGuinness-NOAA commented 2 weeks ago

@RussTreadon-NOAA Yes I noticed this too on Hercules this morning when looking into the Running CI jobs. It seems to be ok now and I killed all the running jobs. Your labeled ready job never started because we had too many in the queue. I will start this one now.

JessicaMeixner-NOAA commented 2 weeks ago

@RussTreadon-NOAA I had the same issue and submitted a ticket, they rebooted the nodes and things seem to be better now.