NOAA-EMC / GDASApp

Global Data Assimilation System Application
GNU Lesser General Public License v2.1
15 stars 31 forks source link

Integrate Model Variable Renaming Sprint changes in to GDASApp yamls and templates #1362

Closed RussTreadon-NOAA closed 2 weeks ago

RussTreadon-NOAA commented 3 weeks ago

Several JEDI repositories have been updated with changes from the Model Variable Renaming Sprint. Updating JEDI hashes in sorc/ requires changes in GDASApp and jcb-gdas yamls and templates. This issue is opened to document these changes.

danholdaway commented 2 weeks ago

Thanks you for this effort Russ.

RussTreadon-NOAA commented 2 weeks ago

@AndrewEichmann-NOAA , I updated feature/resume_nightly with GDASApp develop. This brought in changes from #1352. Now g-w gdas_marinefinal fails with

0: ========= Processing /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build/gdas/test/gw-ci/../../test/gw-ci/WCDA-3DVAR-C48mx500/COMROOT/WCDA-3DVAR-C48mx500/gdas.20210324/18//analysis/ocean/diags/insitu_surface_trkob.2021032418.nc4          date: 2021032418
0: insitu_surface_trkob.2021032418.nc4: read database from /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build/gdas/test/gw-ci/../../test/gw-ci/WCDA-3DVAR-C48mx500/COMROOT/WCDA-3DVAR-C48mx500/gdas.20210324/18//analysis/ocean/diags/insitu_surface_trkob.2021032418.nc4 (io pool size: 1)
0: insitu_surface_trkob.2021032418.nc4 processed vars: 2 Variables: seaSurfaceSalinity, seaSurfaceTemperature
0: insitu_surface_trkob.2021032418.nc4 assimilated vars: 1 Variables: seaSurfaceSalinity
0: nlocs =863
0: Exception:   Reason: An exception occurred inside ioda while opening a variable.
0:      name:   ombg/seaSurfaceSalinity
0:      source_column:  0
0:      source_filename:        /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/bundle/ioda/src/engines/ioda/src/ioda/Has_Variables.cpp

Is this failure possibly related to #1352?

RussTreadon-NOAA commented 2 weeks ago

GDASApp PR #1374 modifies test/marine/CMakeLists.txt such that the correct python version is set for test_gdasapp_bufr2ioda_insitu*. With this change in place all test_gdasapp_bufr2ioda_insitu* pass

Test project /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build
    Start 1993: test_gdasapp_bufr2ioda_insitu_profile_argo
1/8 Test #1993: test_gdasapp_bufr2ioda_insitu_profile_argo .......   Passed   52.78 sec
    Start 1994: test_gdasapp_bufr2ioda_insitu_profile_bathy
2/8 Test #1994: test_gdasapp_bufr2ioda_insitu_profile_bathy ......   Passed    3.72 sec
    Start 1995: test_gdasapp_bufr2ioda_insitu_profile_glider
3/8 Test #1995: test_gdasapp_bufr2ioda_insitu_profile_glider .....   Passed    3.62 sec
    Start 1996: test_gdasapp_bufr2ioda_insitu_profile_tesac
4/8 Test #1996: test_gdasapp_bufr2ioda_insitu_profile_tesac ......   Passed    5.71 sec
    Start 1997: test_gdasapp_bufr2ioda_insitu_profile_tropical
5/8 Test #1997: test_gdasapp_bufr2ioda_insitu_profile_tropical ...   Passed    3.33 sec
    Start 1998: test_gdasapp_bufr2ioda_insitu_profile_xbtctd
6/8 Test #1998: test_gdasapp_bufr2ioda_insitu_profile_xbtctd .....   Passed    2.62 sec
    Start 1999: test_gdasapp_bufr2ioda_insitu_surface_drifter
7/8 Test #1999: test_gdasapp_bufr2ioda_insitu_surface_drifter ....   Passed    2.41 sec
    Start 2000: test_gdasapp_bufr2ioda_insitu_surface_trkob
8/8 Test #2000: test_gdasapp_bufr2ioda_insitu_surface_trkob ......   Passed    2.86 sec

100% tests passed, 0 tests failed out of 8

Total Test time (real) =  78.83 sec
RussTreadon-NOAA commented 2 weeks ago

@AndrewEichmann-NOAA , I rolled back the change to parm/soca/obs/obs_list.yaml from #1352 and reran the test_gdasapp_WCDA-3DVAR-C48mx500 suite of tests. All passed

Test project /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build
    Start 1960: test_gdasapp_WCDA-3DVAR-C48mx500
1/9 Test #1960: test_gdasapp_WCDA-3DVAR-C48mx500 ....................................   Passed   32.22 sec
    Start 1961: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_stage_ic_202103241200
2/9 Test #1961: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_stage_ic_202103241200 .........   Passed   58.00 sec
    Start 1962: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_fcst_seg0_202103241200
3/9 Test #1962: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_fcst_seg0_202103241200 ........   Passed  408.83 sec
    Start 1963: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_prepoceanobs_202103241800
4/9 Test #1963: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_prepoceanobs_202103241800 .....   Passed  266.16 sec
    Start 1964: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marinebmat_202103241800
5/9 Test #1964: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marinebmat_202103241800 .......   Passed  168.43 sec
    Start 1965: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlinit_202103241800
6/9 Test #1965: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlinit_202103241800 ....   Passed  111.26 sec
    Start 1966: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlvar_202103241800
7/9 Test #1966: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlvar_202103241800 .....   Passed  168.09 sec
    Start 1967: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlchkpt_202103241800
8/9 Test #1967: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlchkpt_202103241800 ...   Passed  180.57 sec
    Start 1968: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800
9/9 Test #1968: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800 ...   Passed   68.63 sec

100% tests passed, 0 tests failed out of 9

Label Time Summary:
manual    = 1462.19 sec*proc (9 tests)

Total Test time (real) = 1463.94 sec

Does failure of test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800 with the PR #1352 parm/soca/obs/obs_list.yaml make sense?

RussTreadon-NOAA commented 2 weeks ago

@guillaumevernieres and @AndrewEichmann-NOAA : test_gdasapp_WCDA-hyb-C48mx500_gdas_marineanlletkf_202103241800 fails with the error

^[[38;5;39m2024-11-14 03:11:43,830 - DEBUG    - marine_da_utils: Executing srun -l --export=ALL --hint=nomultithread -n 16 /work/noaa/da/rtreadon/git/global-workflow/pr2992/exec/gdas_soca_gridgen.x /work/noaa/da/rtreadon/git/global-workflow/pr2992/parm/gdas/soca/gridgen/gridgen.yaml^[[0m
 2: Exception: Cannot open /work/noaa/da/rtreadon/git/global-workflow/pr2992/parm/gdas/soca/gridgen/gridgen.yaml  (No such file or directory)

There is no g-w directory parm/gdas/soca/gridgen. I checked g-w PR #3041. I do not see any change to sorc/link_workflow.sh to add this directory to parm/gdas/soca.

Should test_gdasapp_WCDA-hyb-C48mx500_gdas_marineanlletkf_202103241800 successfully run in GDASApp develop with g-w develop? Does this test work when built and run inside g-w PR #3041?

RussTreadon-NOAA commented 2 weeks ago

11/14 status

g-w DA CI testing complete on Hercules. 63 out of 64 test_gdasapp tests pass on Hercules.

Two issues remain to be resolved:

  1. test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800 fails when using parm/soca/obs/obs_list.yaml from GDASApp develop at 6bc2760. Reverting to the previous version of obs_list.yaml allows test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800 to pass.

  2. test_gdasapp_WCDA-hyb-C48mx500_gdas_marineanlletkf_202103241800 fails for at least two reasons

    • the marineanlletkf section is missing from g-w env/HERCULES.env
    • gdas_soca_gridgen.x fails because input yaml $HOMEgfs/parm/gdas/soca/gridgen/gridgen.yaml does not exist

We can not resume nightly testing until all ctest pass. Given this we need to answer two questions

  1. Do we

    • fix GDASApp develop so that test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800 passes when using obs_list.yaml from develop, or
    • revert to the previous version of obs_list.yaml, or
    • disable test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800?
  2. Do we

    • fix test_gdasapp_WCDA-hyb-C48mx500_gdas_marineanlletkf_202103241800 by adding the missing parm/gdas/soca/gridgen to g-w PR #2992, or
    • disable test test_gdasapp_WCDA-hyb-C48mx500_gdas_marineanlletkf_202103241800?

Tagging @guillaumevernieres , @AndrewEichmann-NOAA , @CoryMartin-NOAA , @danholdaway , @DavidNew-NOAA

DavidNew-NOAA commented 2 weeks ago

@RussTreadon-NOAA With regard to gridgen, I deleted that in GDASApp when I refactored the marine bmat, not realizing another code would use it. It exists now as parm/jcb-gdas/algorithm/marine/gridgen.yaml, so you can just point to that file until I refactor the rest of the marine code using JCB.

DavidNew-NOAA commented 2 weeks ago

@RussTreadon-NOAA Just to answer your question, I say that I add the following to #2992

  1. Point marineanlletkf to gridgen in jcb-gdas per my above comment
  2. Add marineanlletkf to env/HERCULES.env

And then we either revert the obs_list.yaml or fix it

RussTreadon-NOAA commented 2 weeks ago

gdas_marineanlletkf failure - RESOLVED

test_gdasapp_WCDA-hyb-C48mx500_gdas_marineanlletkf_202103241800 passes after making the following changes

RussTreadon-NOAA commented 2 weeks ago

gdas_marineanlfinal failure - UPDATE

gdas_marineanlfinal fails when gdassoca_obsstats.x attempts to extract seaSurfaceSalinity from the insitu_surface_trkob diagnostic file

0: insitu_surface_trkob.2021032418.nc4: read database from /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build/gdas/test/gw-ci/../../test/gw-ci/WCDA-3DVAR-C48mx500/COMROOT/WCDA-3DVAR-C48mx500/gdas.20210324/18//analysis/ocean/diags/insitu_surface_trkob.2021032418.nc4 (io pool size: 1)
0: insitu_surface_trkob.2021032418.nc4 processed vars: 2 Variables: seaSurfaceSalinity, seaSurfaceTemperature
0: insitu_surface_trkob.2021032418.nc4 assimilated vars: 1 Variables: seaSurfaceSalinity
0: nlocs =863
0: Exception:   Reason: An exception occurred inside ioda while opening a variable.
0:      name:   ombg/seaSurfaceSalinity

diags_stats.yaml specifies variable seaSurfaceSalinity to be extracted from analysis/ocean/diags/insitu_surface_trkob.2021032418.nc4.

      engine:
        type: H5File
        obsfile: /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build/gdas/test/gw-ci/../../test/gw-ci/WCDA-3DVAR-C48mx500/COMROOT/WCDA-3DVAR\
-C48mx500/gdas.20210324/18//analysis/ocean/diags/insitu_surface_trkob.2021032418.nc4
    simulated variables:
    - seaSurfaceSalinity
  variable: seaSurfaceSalinity

This is problematic. Variable seaSurfaceSalinity is not in the diagnostic file. var.yaml only specifies seaSurfaceTemperature to be written to the diagnostic file

        obsdataout:
          engine:
            type: H5File
            obsfile: /work/noaa/stmp/rtreadon/HERCULES/RUNDIRS/WCDA-3DVAR-C48mx500/gdas.2021032418/gdasmarineanalysis.2021032418/marinevariational/diags/insitu_surface_trkob.2021032418.nc4
        simulated variables:
        - seaSurfaceTemperature
        io pool:
          max pool size: 1

Method obs_space_stats in ush/python/pygfs/task/marine_analysis.py creates diag_stats.yaml. The yaml is populated by querying the diagnostic files in the run directory. Specifically, ObsValue is checked to determine the variable to add to diag_stats.yaml. The ObsValue group contains two variables

group: ObsValue {
  variables:
        float seaSurfaceSalinity(Location) ;
                seaSurfaceSalinity:_FillValue = -3.368795e+38f ;
                string seaSurfaceSalinity:long_name = "seaSurfaceSalinity" ;
                string seaSurfaceSalinity:units = "psu" ;
                seaSurfaceSalinity:valid_range = 0.f, 45.f ;
                seaSurfaceSalinity:_Storage = "chunked" ;
                seaSurfaceSalinity:_ChunkSizes = 863 ;
                seaSurfaceSalinity:_Endianness = "little" ;
        float seaSurfaceTemperature(Location) ;
                seaSurfaceTemperature:_FillValue = -3.368795e+38f ;
                string seaSurfaceTemperature:long_name = "seaSurfaceTemperature" ;
                string seaSurfaceTemperature:units = "degC" ;
                seaSurfaceTemperature:valid_range = -10.f, 50.f ;
                seaSurfaceTemperature:_Storage = "chunked" ;
                seaSurfaceTemperature:_ChunkSizes = 863 ;
                seaSurfaceTemperature:_Endianness = "little" ;

  // group attributes:
  } // group ObsValue

However, the ombg group only contains seaSurfaceTemperature

group: ombg {
  variables:
        float seaSurfaceTemperature(Location) ;
                seaSurfaceTemperature:_FillValue = -3.368795e+38f ;
                seaSurfaceTemperature:_Storage = "chunked" ;
                seaSurfaceTemperature:_ChunkSizes = 863 ;
                seaSurfaceTemperature:_Endianness = "little" ;

  // group attributes:
  } // group ombg

Do we need to change the logic in method obs_space_stats in marine_analysis.py to check ombg instead of ObsValue when populating diag_stats.yaml?

What do you think @guillaumevernieres ? Who on the Marine DA team should I discuss this issue with?

RussTreadon-NOAA commented 2 weeks ago

FYI, making the change suggested above to marine_analysis.py

            # get the variable name, assume 1 variable per file
            nc = netCDF4.Dataset(obsfile, 'r')
            ##variable = next(iter(nc.groups["ObsValue"].variables))
            variable = next(iter(nc.groups["ombg"].variables))
            print(f"variable {variable}")
            nc.close()

works. With this change gdas_marineanlfinal passes.

(gdasapp) hercules-login-2:/work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build$ ctest -R test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800                                                                           Test project /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build
    Start 1968: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800
1/1 Test #1968: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800 ...   Passed   80.23 sec

100% tests passed, 0 tests failed out of 1

Label Time Summary:
manual    =  80.23 sec*proc (1 test)

Total Test time (real) =  84.94 sec

Another thought: Is the better solution to add seaSurfaceSalinity to simulated variables for insitu_surface_trkob That is should var.yaml read

        obsdataout:
          engine:
            type: H5File
            obsfile: /work/noaa/stmp/rtreadon/HERCULES/RUNDIRS/WCDA-3DVAR-C48mx500/gdas.2021032418/gdasmarineanalysis.2021032418/marinevariational/diags/insitu_surface_trkob.2021032418.nc4
        simulated variables:
        - seaSurfaceTemperature
        - seaSurfaceSalinity
        io pool:
          max pool size: 1
AndrewEichmann-NOAA commented 2 weeks ago

@RussTreadon-NOAA The letkf problems would be resolved with https://github.com/NOAA-EMC/GDASApp/pull/1372, which adds back the original gridgen yaml under parm, and adds the localization blocks to the obs space config files.

guillaumevernieres commented 2 weeks ago

@RussTreadon-NOAA , reverting the obs_list.yaml is what we should do.

RussTreadon-NOAA commented 2 weeks ago

@AndrewEichmann-NOAA , thank you for pointing me at GDASApp PR #1372 to me.

PR #1372 places gridgen.yaml in parm/soca/gridgen/gridgen.yaml. @DavidNew-NOAA mentioned above that gridgen.yaml is now in parm/jcb-gdas/algorithm/marine/gridgen.yaml

We don't need gridgen.yaml in two places. Which location do we go with?

It's good to see that #1372 addresses the missing obs localizations blocks mentioned above.

DavidNew-NOAA commented 2 weeks ago

@RussTreadon-NOAA @AndrewEichmann-NOAA Let's just leave gridgen.yaml in jcb-gdas and point there for now

AndrewEichmann-NOAA commented 2 weeks ago

@RussTreadon-NOAA @DavidNew-NOAA While it does belong under jcb and the letkf task should be converted to using it, that will require a PR to global-workflow, and letkf will be broken until that PR gets merged.

DavidNew-NOAA commented 2 weeks ago

@AndrewEichmann-NOAA I put the jcb-gdas gridgen.yaml reference in config.marineanlletkf here in the this PR in my last commit this morning

DavidNew-NOAA commented 2 weeks ago

@AndrewEichmann-NOAA Sorry, I meant in GW PR #2992

AndrewEichmann-NOAA commented 2 weeks ago

@DavidNew-NOAA Ah, ok

RussTreadon-NOAA commented 2 weeks ago

@RussTreadon-NOAA , reverting the obs_list.yaml is what we should do.

Thanks @guillaumevernieres for the guidance. parm/soca/obs/obs_list.yaml was reverted at 716dcdb

RussTreadon-NOAA commented 2 weeks ago

FYI

I am manually running g-w DA CI on Hera & Hercules using g-w branch feature/jcb-obsbias at 42904ba with sorc/gdas.cd populated with GDASApp branch feature/resume_nightly at 716dcdb.

test_gdasapp had 64 out of 64 tests pass on both machines.

test_gdasapp is currently running on Orion. Pending a 64/64 result I'll launch g-w DA CI on Orion.