NOAA-EMC / global-workflow

Global Superstructure/Workflow supporting the Global Forecast System (GFS)
https://global-workflow.readthedocs.io/en/latest
GNU Lesser General Public License v3.0
70 stars 162 forks source link

Fix and simplify online archiving #2687

Closed DavidHuber-NOAA closed 1 week ago

DavidHuber-NOAA commented 2 weeks ago

Description

This fixes the online archiving portion of the *arch and *earc00 jobs.

The approach previously taken created FileHandler dictionaries at varying levels within the resulting yaml, which was not properly parsed by exglobal_archive.py. This approach creates a single FileHandler dictionary and is much less complicated overall.

Resolves #2673 #2647

Type of change

Change characteristics

How has this been tested?

C96C48_hybatmDA test on Hera

Checklist

DavidHuber-NOAA commented 2 weeks ago

This is also a prerequisite for https://github.com/NOAA-EMC/global-workflow/issues/2647.

aerorahul commented 2 weeks ago

@DavidHuber-NOAA Please plan on updating this branch once #2668 is merged.

DavidHuber-NOAA commented 2 weeks ago

Merged in the changes from #2668 and I'm now rerunning a C96C48_hybatmDA test case on Hera. I will notify when it is finished.

emilyhcliu commented 2 weeks ago

@DavidHuber-NOAA I just setup the global-workflow with your fix/archive branch which you just merged from develop earlier this morning.

commit 3329fe7f8eab53a7983fba9a2c148399843f3bfd (HEAD -> fix/archive, DavidHuber/fix/archive)
Merge: 79c333c4 35d4d99e
Author: DavidHuber <david.huber@noaa.gov>
Date:   Tue Jun 18 15:10:16 2024 +0000

    Merge remote-tracking branch 'origin/develop' into fix/archive

I set DO_METP = YES I am running a short cycle experiment from your branch on HERA:

HOMEgfs: /scratch1/NCEPDEV/da/Emily.Liu/git/Global-Workflow/global-workflow-archivefix
EXPDIR:  /scratch1/NCEPDEV/da/Emily.Liu/para/v17/v17test
ROTDIRS: /scratch2/NCEPDEV/stmp3/Emily.Liu/ROTDIRS/v17test
RUNDIRS: /scratch1/NCEPDEV/stmp2/Emily.Liu/RUNDIRS/v17test
ARCDIR: /scratch1/NCEPDEV/da/Emily.Liu/archive/v17test 

The run just started.
Reviews can check if the archive works as expected.

Tagging @CatherineThomas-NOAA @azadeh-gh @malloryprow @WalterKolczynski-NOAA

emilyhcliu commented 2 weeks ago

@DavidHuber-NOAA The test parallel experiment c17test failed at it first gdas and gfs archiving jobs. Here are the log files: /scratch2/NCEPDEV/stmp3/Emily.Liu/ROTDIRS/v17test/logs/2023040200/gdasarch.log /scratch2/NCEPDEV/stmp3/Emily.Liu/ROTDIRS/v17test/logs/2023040200/gfsarch.log

Here is the error message from gfsarch.log:

816 Traceback (most recent call last):
817   File "/scratch1/NCEPDEV/da/Emily.Liu/git/Global-Workflow/global-workflow-archivefix/scripts/exglobal_archive.py", line 63, in <module>
818     main()
819   File "/scratch1/NCEPDEV/da/Emily.Liu/git/Global-Workflow/global-workflow-archivefix/ush/python/wxflow/logger.py", line 266, in wrapper
820     retval = func(*args, **kwargs)
821              ^^^^^^^^^^^^^^^^^^^^^
822   File "/scratch1/NCEPDEV/da/Emily.Liu/git/Global-Workflow/global-workflow-archivefix/scripts/exglobal_archive.py", line 50, in main
823     arcdir_set, atardir_sets = archive.configure(archive_dict)
824                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
825   File "/scratch1/NCEPDEV/da/Emily.Liu/git/Global-Workflow/global-workflow-archivefix/ush/python/wxflow/logger.py", line 266, in wrapper
826     retval = func(*args, **kwargs)
827              ^^^^^^^^^^^^^^^^^^^^^
828   File "/scratch1/NCEPDEV/da/Emily.Liu/git/Global-Workflow/global-workflow-archivefix/ush/python/pygfs/task/archive.py", line 87, in configure
829     arcdir_set = Archive._construct_arcdir_set(arcdir_j2yaml,
830                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
831   File "/scratch1/NCEPDEV/da/Emily.Liu/git/Global-Workflow/global-workflow-archivefix/ush/python/wxflow/logger.py", line 266, in wrapper
832     retval = func(*args, **kwargs)
833              ^^^^^^^^^^^^^^^^^^^^^
834   File "/scratch1/NCEPDEV/da/Emily.Liu/git/Global-Workflow/global-workflow-archivefix/ush/python/pygfs/task/archive.py", line 353, in _construct_arcdir_set
835     arcdir_set = parse_j2yaml(arcdir_j2yaml,
836                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
837   File "/scratch1/NCEPDEV/da/Emily.Liu/git/Global-Workflow/global-workflow-archivefix/ush/python/wxflow/yaml_file.py", line 185, in parse_j2yaml
838     return YAMLFile(data=Jinja(path, data, searchpath=searchpath, allow_missing=allow_missing).render)
839                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
840   File "/scratch1/NCEPDEV/da/Emily.Liu/git/Global-Workflow/global-workflow-archivefix/ush/python/wxflow/jinja.py", line 198, in render
841     return render_map[self.template_type]()
842            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
843   File "/scratch1/NCEPDEV/da/Emily.Liu/git/Global-Workflow/global-workflow-archivefix/ush/python/wxflow/jinja.py", line 209, in _render_file
844     template = env.get_template(self.template_file)
845                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
846   File "/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.6.0/envs/gsi-addon-dev-rocky8/install/intel/2021.5.0/py-jinja2-3.1.2-3yb4fme/lib/python3.11/site-packages/jin    ja2/environment.py", line 1010, in get_template
847     return self._load_template(name, globals)
848            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
849   File "/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.6.0/envs/gsi-addon-dev-rocky8/install/intel/2021.5.0/py-jinja2-3.1.2-3yb4fme/lib/python3.11/site-packages/jin    ja2/environment.py", line 969, in _load_template
850     template = self.loader.load(self, name, self.make_globals(globals))
851                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
852   File "/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.6.0/envs/gsi-addon-dev-rocky8/install/intel/2021.5.0/py-jinja2-3.1.2-3yb4fme/lib/python3.11/site-packages/jin    ja2/loaders.py", line 138, in load
853     code = environment.compile(source, name, filename)
854            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
855   File "/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.6.0/envs/gsi-addon-dev-rocky8/install/intel/2021.5.0/py-jinja2-3.1.2-3yb4fme/lib/python3.11/site-packages/jin    ja2/environment.py", line 768, in compile
856     self.handle_exception(source=source_hint)
857   File "/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.6.0/envs/gsi-addon-dev-rocky8/install/intel/2021.5.0/py-jinja2-3.1.2-3yb4fme/lib/python3.11/site-packages/jin    ja2/environment.py", line 936, in handle_exception
858     raise rewrite_traceback_stack(source=source)
859   File "/scratch1/NCEPDEV/da/Emily.Liu/git/Global-Workflow/global-workflow-archivefix/parm/archive/arcdir.yaml.j2", line 61, in template
860     {% do det_anl_files.append([COMIN_OBS ~ "/" ~ head ~ "aeroawobs",
861 ^^^^^^^^^^^^^^^^^^^^^
862 jinja2.exceptions.TemplateSyntaxError: expected token ',', got '{'
863 + JGLOBAL_ARCHIVE[1]: postamble JGLOBAL_ARCHIVE 1718741136 1
864 + preamble.sh[70]: set +x
865 End JGLOBAL_ARCHIVE at 20:05:58 with error code 1 (time elapsed: 00:00:22)
866 + arch.sh[1]: postamble arch.sh 1718741125 1
867 + preamble.sh[70]: set +x
868 End arch.sh at 20:05:59 with error code 1 (time elapsed: 00:00:34)
emilyhcliu commented 2 weeks ago

The EnKF archive jobs failed as well. /scratch2/NCEPDEV/stmp3/Emily.Liu/ROTDIRS/v17test/logs/2023040200/enkfgdasearc01.log

Traceback (most recent call last):
  File "/scratch1/NCEPDEV/da/Emily.Liu/git/Global-Workflow/global-workflow-archivefix/scripts/exgdas_enkf_earc.py", line 60, in <module>
    main()
  File "/scratch1/NCEPDEV/da/Emily.Liu/git/Global-Workflow/global-workflow-archivefix/ush/python/wxflow/logger.py", line 266, in wrapper
    retval = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/scratch1/NCEPDEV/da/Emily.Liu/git/Global-Workflow/global-workflow-archivefix/scripts/exgdas_enkf_earc.py", line 47, in main
    arcdir_set, atardir_sets = archive.configure(archive_dict)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch1/NCEPDEV/da/Emily.Liu/git/Global-Workflow/global-workflow-archivefix/ush/python/wxflow/logger.py", line 266, in wrapper
    retval = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/scratch1/NCEPDEV/da/Emily.Liu/git/Global-Workflow/global-workflow-archivefix/ush/python/pygfs/task/archive.py", line 87, in configure
    arcdir_set = Archive._construct_arcdir_set(arcdir_j2yaml,
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch1/NCEPDEV/da/Emily.Liu/git/Global-Workflow/global-workflow-archivefix/ush/python/wxflow/logger.py", line 266, in wrapper
    retval = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/scratch1/NCEPDEV/da/Emily.Liu/git/Global-Workflow/global-workflow-archivefix/ush/python/pygfs/task/archive.py", line 353, in _construct_arcdir_set
    arcdir_set = parse_j2yaml(arcdir_j2yaml,
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch1/NCEPDEV/da/Emily.Liu/git/Global-Workflow/global-workflow-archivefix/ush/python/wxflow/yaml_file.py", line 185, in parse_j2yaml
    return YAMLFile(data=Jinja(path, data, searchpath=searchpath, allow_missing=allow_missing).render)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch1/NCEPDEV/da/Emily.Liu/git/Global-Workflow/global-workflow-archivefix/ush/python/wxflow/jinja.py", line 198, in render
    return render_map[self.template_type]()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch1/NCEPDEV/da/Emily.Liu/git/Global-Workflow/global-workflow-archivefix/ush/python/wxflow/jinja.py", line 209, in _render_file
    template = env.get_template(self.template_file)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.6.0/envs/gsi-addon-dev-rocky8/install/intel/2021.5.0/py-jinja2-3.1.2-3yb4fme/lib/python3.11/site-packages/jinja2/environment.py", line 1010, in get_template
    return self._load_template(name, globals)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.6.0/envs/gsi-addon-dev-rocky8/install/intel/2021.5.0/py-jinja2-3.1.2-3yb4fme/lib/python3.11/site-packages/jinja2/environment.py", line 969, in _load_template
    template = self.loader.load(self, name, self.make_globals(globals))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.6.0/envs/gsi-addon-dev-rocky8/install/intel/2021.5.0/py-jinja2-3.1.2-3yb4fme/lib/python3.11/site-packages/jinja2/loaders.py", line 138, in load
    code = environment.compile(source, name, filename)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.6.0/envs/gsi-addon-dev-rocky8/install/intel/2021.5.0/py-jinja2-3.1.2-3yb4fme/lib/python3.11/site-packages/jinja2/environment.py", line 768, in compile
    self.handle_exception(source=source_hint)
  File "/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.6.0/envs/gsi-addon-dev-rocky8/install/intel/2021.5.0/py-jinja2-3.1.2-3yb4fme/lib/python3.11/site-packages/jinja2/environment.py", line 936, in handle_exception
    raise rewrite_traceback_stack(source=source)
  File "/scratch1/NCEPDEV/da/Emily.Liu/git/Global-Workflow/global-workflow-archivefix/parm/archive/arcdir.yaml.j2", line 61, in template
    {% do det_anl_files.append([COMIN_OBS ~ "/" ~ head ~ "aeroawobs",
^^^^^^^^^^^^^^^^^^^^^
jinja2.exceptions.TemplateSyntaxError: expected token ',', got '{'
+ JGDAS_ENKF_ARCHIVE[1]: postamble JGDAS_ENKF_ARCHIVE 1718752225 1
+ preamble.sh[70]: set +x
End JGDAS_ENKF_ARCHIVE at 23:10:51 with error code 1 (time elapsed: 00:00:26)
DavidHuber-NOAA commented 1 week ago

@emilyhcliu Thanks for the catch. It seems I missed a couple closing parentheses and curly braces. I have added them and am restarting my test after a merge with develop.

emilyhcliu commented 1 week ago

@emilyhcliu Thanks for the catch. It seems I missed a couple closing parentheses and curly braces. I have added them and am restarting my test after a merge with develop.

@DavidHuber-NOAA Thanks for fixing the errors. I will update my branch and rewind the archive jobs. I will report back here.

DavidHuber-NOAA commented 1 week ago

@emilyhcliu @malloryprow I enabled the METplus jobs in my test and turned them on by default. After running through the first full GFS cycle, I verified that the gdas, gfs, and enkfgdas pgb, gsistat, enkfstat, storms, and trak files were sent to the online archive and that the metp jobs ran successfully (the first cycle only runs Grid2Grid and Grid2Obs; Precip will run on the next full GFS cycle).

I have updated the description of this PR to also resolve #2647. Please feel free to take a look at the experiment setup, logs, online archive, and METplus outputs:

Directory Name Path
EXPDIR /scratch1/NCEPDEV/global/David.Huber/para/EXPDIR/db_arch
COMROOT /scratch1/NCEPDEV/global/David.Huber/para/COMROOT/db_arch
ARCDIR /scratch1/NCEPDEV/global/David.Huber/archive/db_arch
METplusOUT ARCDIR /scratch1/NCEPDEV/global/David.Huber/archive/metplus_data/by_VSDB///00Z/db_arch
malloryprow commented 1 week ago

METplus output looks like what I would expect for the first day's run and a 00Z GFS cycle. 👍

emilyhcliu commented 1 week ago

@DavidHuber-NOAA Thanks for running the test. I checked the online archive for the GDAS and GFS for the first cycle

  1. The GDAS analysis, the 9-hourly forecast, and the gsistat/enkfstat are archived OK.
  2. The GFS analysis, the 3-hourly forecast up to 5 days, and the gsistat file are archived OK.
DavidHuber-NOAA commented 1 week ago

The second GFS cycle completed and all METplus jobs appear to have completed successfully, including Precip, which produced /scratch1/NCEPDEV/global/David.Huber/archive/metplus_databy_VSDB/precip/ccpa_accum24hr/00Z/db_arch/db_arch_20211222.stat. @malloryprow Could you check this file to make sure it is valid?

DavidHuber-NOAA commented 1 week ago

Starting CI testing.

emcbot commented 1 week ago

CI Update on Wcoss2 at 06/21/24 12:51:09 PM
============================================
Cloning and Building global-workflow PR: 2687
with PID: 225905 on host: dlogin08
emcbot commented 1 week ago

Automated global-workflow Testing Results:


Machine: Wcoss2
Start: Fri Jun 21 12:55:23 UTC 2024 on dlogin08
---------------------------------------------------
Build: Completed at 06/21/24 01:30:01 PM
Case setup: Completed for experiment C48_ATM_56009241
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_56009241
Case setup: Skipped for experiment C48_S2SWA_gefs_56009241
Case setup: Completed for experiment C48_S2SW_56009241
Case setup: Completed for experiment C96_atm3DVar_extended_56009241
Case setup: Skipped for experiment C96_atm3DVar_56009241
Case setup: Skipped for experiment C96_atmaerosnowDA_56009241
Case setup: Completed for experiment C96C48_hybatmDA_56009241
Case setup: Completed for experiment C96C48_ufs_hybatmDA_56009241
emcbot commented 1 week ago

Experiment C96_atmaerosnowDA FAILED on Hera with error logs:

/scratch1/NCEPDEV/global/CI/2687/RUNTESTS/COMROOT/C96_atmaerosnowDA_56009241/logs/2021122018/gdasprepsnowobs.log

Follow link here to view the contents of the above file(s): (link)

emcbot commented 1 week ago

Experiment C96_atmaerosnowDA FAILED on Hera in /scratch1/NCEPDEV/global/CI/2687/RUNTESTS/C96_atmaerosnowDA_56009241

aerorahul commented 1 week ago

Experiment C96_atmaerosnowDA FAILED on Hera with error logs:

/scratch1/NCEPDEV/global/CI/2687/RUNTESTS/COMROOT/C96_atmaerosnowDA_56009241/logs/2021122018/gdasprepsnowobs.log

Follow link here to view the contents of the above file(s): (link)

The hash is 56009241 and datetime.strptime is trying to convert this to a datetime object with some regex stuff. This was encountered before and a IODA-converters had an issue that seems resolved. https://github.com/JCSDA-internal/ioda-converters/issues/1497 @CoryMartin-NOAA @jiaruidong2017 any thoughts?

emcbot commented 1 week ago

Experiment C48_ATM_56009241 SUCCESS on Wcoss2 at 06/21/24 02:44:15 PM

emcbot commented 1 week ago

Experiment C48_S2SW_56009241 SUCCESS on Wcoss2 at 06/21/24 02:48:10 PM

DavidHuber-NOAA commented 1 week ago

@aerorahul @CoryMartin-NOAA @jiaruidong2017 A simple fix would be to replace the re call with

str_date = re.findall(r'\d{8}', filename)[-1]

What do you think?

CoryMartin-NOAA commented 1 week ago

I think we just need to update the GDASApp to point to a more recent ioda-converters hash, then this should (hopefully) be resolved

emcbot commented 1 week ago

Experiment C96C48_hybatmDA_56009241 SUCCESS on Wcoss2 at 06/21/24 03:44:16 PM

emcbot commented 1 week ago

Experiment C96C48_ufs_hybatmDA_56009241 SUCCESS on Wcoss2 at 06/21/24 03:52:13 PM

emcbot commented 1 week ago

CI Passed Hercules at
Built and ran in directory /work2/noaa/stmp/CI/HERCULES/2687

DavidHuber-NOAA commented 1 week ago

@aerorahul Manually checking the other CI tests on Hera, they all passed. I believe this PR is now ready to be merged.

aerorahul commented 1 week ago

Thanks @DavidHuber-NOAA I see @emilyhcliu has approved and confirmation comments from @malloryprow Feel free to merge.

emcbot commented 1 week ago

Experiment C96_atm3DVar_extended_56009241 SUCCESS on Wcoss2 at 06/21/24 11:56:29 PM

emcbot commented 1 week ago

All CI Test Cases Passed on Wcoss2:


Experiment C48_ATM_56009241 *** SUCCESS *** at 06/21/24 02:44:15 PM
Experiment C48_S2SW_56009241 *** SUCCESS *** at 06/21/24 02:48:10 PM
Experiment C96C48_hybatmDA_56009241 *** SUCCESS *** at 06/21/24 03:44:16 PM
Experiment C96C48_ufs_hybatmDA_56009241 *** SUCCESS *** at 06/21/24 03:52:13 PM
Experiment C96_atm3DVar_extended_56009241 *** SUCCESS *** at 06/21/24 11:56:29 PM