NOAA-EMC / global-workflow

Global Superstructure/Workflow supporting the Global Forecast System (GFS)
https://global-workflow.readthedocs.io/en/latest
GNU Lesser General Public License v3.0
75 stars 168 forks source link

Two fix files needs to be updated for the test c48 (ATM only) to be matched with UFSWM #1723

Open SadeghTabas-NOAA opened 1 year ago

SadeghTabas-NOAA commented 1 year ago

Description

Global workflow uses global_soilmgldas.statsgo.t94.192.96.grb and global_slmask.t1534.3072.1536.grb for the test c48 (ATM only). On the other side, ufswm uses global_soilmgldas.statsgo.t92.192.94.grb and global_slmask.t62.192.94.grb instead. As the c48 is a low resolution test, low resolution fix files need to be used in order to be matched with ufswm.

Tasks

WalterKolczynski-NOAA commented 1 year ago

This isn't actually a fix file request because the files are already in fix.

For the first file, these are identical. The fix directory has symlinks for all the t94.192.96 files to the t92.192.94 files because one of the names is incorrect. Given all gw names use the same calculation to determine those names from the model grid, I'm going to say the one workflow is using (t94.192.96) is probably correct.

For slmask, we just need to update the files being used to depend on resolution instead of always using the high-res version.

junwang-noaa commented 1 year ago

@WalterKolczynski-NOAA Thanks for confirming that the global_soilmgldas.statsgo.t94.192.94.grb file is actually linked to global_soilmgldas.statsgo.t92.192.94.grb. Maybe we can ask land team's help to confirm the file names.

@HelinWei-NOAA @barlage @yangfanglin could we rename global_soilmgldas.statsgo.t92.192.94.grb to global_soilmgldas.statsgo.t94.192.94.grb?

HelinWei-NOAA commented 1 year ago

@junwang-noaa All soil moisture data were created by both @GeorgeGayno-NOAA and Jesse Meng. If you look at this directory /scratch1/NCEPDEV/global/glopara/fix/am/20220805, there are more fixed files for the grid t92.192.94. So I am not sure if t94.192.96 is the correct one.

junwang-noaa commented 1 year ago

Thanks for checking it, Helin. @GeorgeGayno-NOAA may I ask your help here?

WalterKolczynski-NOAA commented 1 year ago

All of the resolutions in workflow use the following formulas to calculate these filenames:

JCAP_CASE=$((res*2-2))  # C48 -> 94
LATB_CASE=$((res*2))    # C48 -> 96

I'm not sure why the formula would work for every resolution except C48. Plus I think we had this discussion six months ago or whenever we first made these symlinks.

WalterKolczynski-NOAA commented 1 year ago

I just searched through my email, and @yangfanglin is the one who suggested we use the high-res soilmgldas for all resolutions:

Kate, Since interpolation to the tiles will nevertheless be done for the FV3 model, I'd recommend using global_soilmgldas.statsgo.t1534.3072.1536.grb for all fv3 resolutions two years ago. The reason we created the fix files for different resolutions was to save time for running the spectral model. Fanglin

WalterKolczynski-NOAA commented 1 year ago

Circling back looking to resolve this

@barlage @junwang-noaa @HelinWei-NOAA @GeorgeGayno-NOAA

KateFriedman-NOAA commented 9 months ago

Retagging folks since this went stale: @barlage @junwang-noaa @HelinWei-NOAA @GeorgeGayno-NOAA

KateFriedman-NOAA commented 7 months ago

Retagging folks again @barlage @junwang-noaa @HelinWei-NOAA @GeorgeGayno-NOAA.

What should be done for this?

GeorgeGayno-NOAA commented 7 months ago

Circling back looking to resolve this

  • Should GW continue using the high-res soilmgldas file for all FV3 resolutions as @yangfanglin suggested?

    • If so, can we remove the others from future versions of fix to free up a bit of space (and remove clutter)?
    • If so, presumably UFS RTs should also switch to using only the high-res files.
  • What is the appropriate name for the C48 file: t92.192.94 or t94.192.96? The latter seems to be correct to me based on the formula GW uses to calculate all other resolutions, but I have yet to see consensus. (In fix, one is symlinked to the other, so the file is identical either way.)

@barlage @junwang-noaa @HelinWei-NOAA @GeorgeGayno-NOAA

The filenames are based on the dimensions of the data. So, to be consistent with the other 'gldas' files, the C48 file would be t92.192.94.

HelinWei-NOAA commented 7 months ago

agree with @GeorgeGayno-NOAA

KateFriedman-NOAA commented 7 months ago

The filenames are based on the dimensions of the data. So, to be consistent with the other 'gldas' files, the C48 file would be t92.192.94.

@GeorgeGayno-NOAA @HelinWei-NOAA Understood, however, the internal calculations for the global_soilmgldas fix files can't get us that filename. As Walter mentioned before (https://github.com/NOAA-EMC/global-workflow/issues/1723#issuecomment-1629287368), here are the internal calculations in ush/forecast_postdet.sh currently:

  res="${CASE:1}"

  JCAP_CASE=$((2*res-2))
  LONB_CASE=$((4*res))
  LATB_CASE=$((2*res))

  JCAP=${JCAP:-${JCAP_CASE}}
  LONB=${LONB:-${LONB_CASE}}
  LATB=${LATB:-${LATB_CASE}}

  FNSMCC=${FNSMCC:-"${FIXgfs}/am/global_soilmgldas.statsgo.t${JCAP}.${LONB}.${LATB}.grb"}

When running C48 the above will end up looking for global_soilmgldas.statsgo.t94.192.96.grb. As observed via a C48 enkf fcst job log:

+ forecast_postdet.sh[278]: res=48
...
+ forecast_postdet.sh[287]: JCAP_CASE=94
+ forecast_postdet.sh[288]: LONB_CASE=192
+ forecast_postdet.sh[289]: LATB_CASE=96
+ forecast_postdet.sh[291]: JCAP=94
+ forecast_postdet.sh[292]: LONB=192
+ forecast_postdet.sh[293]: LATB=96
...
+ forecast_postdet.sh[317]: FNSMCC=/scratch1/NCEPDEV/global/Kate.Friedman/git/develop/fix/am/global_soilmgldas.statsgo.t94.192.96.grb
+ forecast_postdet.sh[320]: [[ ! -f /scratch1/NCEPDEV/global/Kate.Friedman/git/develop/fix/am/global_soilmgldas.statsgo.t94.192.96.grb ]]

It only finds that file, however, because we currently have a symlink for that filename to the t92.192.94 one that @GeorgeGayno-NOAA indicated:

[role.glopara@hfe06 20220805]$ ll global_soilmgldas.statsgo*
-rw-r--r-- 1 role.glopara global 111393120 Feb 11  2018 global_soilmgldas.statsgo.t1534.3072.1536.grb
-rw-r--r-- 1 role.glopara global   3100512 Feb 11  2018 global_soilmgldas.statsgo.t254.512.256.grb
-rw-r--r-- 1 role.glopara global  15770592 Feb 11  2018 global_soilmgldas.statsgo.t382.1152.576.grb
-rw-r--r-- 1 role.glopara global   6978720 Feb 11  2018 global_soilmgldas.statsgo.t382.768.384.grb
-rw-r--r-- 1 role.glopara global  15705792 Feb 11  2018 global_soilmgldas.statsgo.t574.1152.576.grb
-rw-r--r-- 1 role.glopara global  27775872 Feb 11  2018 global_soilmgldas.statsgo.t766.1536.768.grb
-rw-r--r-- 1 role.glopara global    435360 Feb 11  2018 global_soilmgldas.statsgo.t92.192.94.grb
lrwxrwxrwx 1 role.glopara global        40 Jun 27  2018 global_soilmgldas.statsgo.t94.192.96.grb -> global_soilmgldas.statsgo.t92.192.94.grb

Looking at other resolution forecast jobs (e.g. C96) you end up with global_soilmgldas.statsgo.t190.384.192.grb, but that file or a symlink equivalent do not exist so the script defaults to the highest resolution file:

+ forecast_postdet.sh[278]: res=96
...
+ forecast_postdet.sh[287]: JCAP_CASE=190
+ forecast_postdet.sh[288]: LONB_CASE=384
+ forecast_postdet.sh[289]: LATB_CASE=192
+ forecast_postdet.sh[291]: JCAP=190
+ forecast_postdet.sh[292]: LONB=384
+ forecast_postdet.sh[293]: LATB=192
...
+ forecast_postdet.sh[317]: FNSMCC=/scratch1/NCEPDEV/global/Kate.Friedman/git/develop/fix/am/global_soilmgldas.statsgo.t190.384.192.grb
+ forecast_postdet.sh[320]: [[ ! -f /scratch1/NCEPDEV/global/Kate.Friedman/git/develop/fix/am/global_soilmgldas.statsgo.t190.384.192.grb ]]
+ forecast_postdet.sh[320]: FNSMCC=/scratch1/NCEPDEV/global/Kate.Friedman/git/develop/fix/am/global_soilmgldas.statsgo.t1534.3072.1536.grb

Only C48 actually uses a lower resolution soilmgldas.statsgo file, because of the existing symlink. All other resolutions use the highest resolution file. If we removed that symlink, the C48 wouldn't find the file and would default to the highest resolution one as well.

So we either need to: 1) use the highest resolution file for C48 as well (as @yangfanglin suggested) 2) rename all of the files to match the internal filename calculation 3) adjust the internal filename calculation to use the existing filenames

Please let us know which option, thanks!

KateFriedman-NOAA commented 7 months ago

I should add, @SadeghTabas-NOAA noted that:

ufswm uses global_soilmgldas.statsgo.t92.192.94.grb

...which would lean towards option 3, if we want to be consistent with UWM and not use the highest resolution file for all resolutions.

KateFriedman-NOAA commented 7 months ago

For the global_slmask file, we have the following available (not showing the rg or f77 slmask files):

[role.glopara@hfe06 20220805]$ ll global_slmask.t* | grep -v rg | grep -v f77
-rw-r--r-- 1 role.glopara global  1327188 May  4  2017 global_slmask.t1148.2304.1152.grb
-rw-r--r-- 1 role.glopara global    36564 May  4  2017 global_slmask.t126.384.190.grb
-rw-r--r-- 1 role.glopara global  2359380 May  4  2017 global_slmask.t1534.3072.1536.grb
-rw-r--r-- 1 role.glopara global    65620 May  4  2017 global_slmask.t170.512.256.grb
lrwxrwxrwx 1 role.glopara global       30 Dec 20  2017 global_slmask.t190.384.192.grb -> global_slmask.t126.384.190.grb
-rw-r--r-- 1 role.glopara global    83028 May  4  2017 global_slmask.t190.576.288.grb
-rw-r--r-- 1 role.glopara global    16468 May  4  2017 global_slmask.t254.512.256.grb
-rw-r--r-- 1 role.glopara global   147540 May  4  2017 global_slmask.t254.768.384.grb
-rw-r--r-- 1 role.glopara global   331860 May  4  2017 global_slmask.t382.1152.576.grb
-rw-r--r-- 1 role.glopara global   147540 May  4  2017 global_slmask.t382.768.384.grb
-rw-r--r-- 1 role.glopara global   331860 May  4  2017 global_slmask.t574.1152.576.grb
-rw-r--r-- 1 role.glopara global   774484 May  4  2017 global_slmask.t574.1760.880.grb
-rw-r--r-- 1 role.glopara global     9108 May  4  2017 global_slmask.t62.192.94.grb
-rw-r--r-- 1 role.glopara global   112980 May  4  2017 global_slmask.t670.1344.672.grb
-rw-r--r-- 1 role.glopara global   147540 Oct 13  2017 global_slmask.t766.1536.768.grb
-rw-r--r-- 1 role.glopara global   774484 May  4  2017 global_slmask.t878.1760.880.grb
-rw-r--r-- 1 role.glopara global  1742484 May  4  2017 global_slmask.t878.2640.1320.grb
-rw-r--r-- 1 role.glopara global     9108 May  4  2017 global_slmask.t92.192.94.grb

...but ush/forecast_postdet.sh uses the highest resolution file (if not set from above, which it generally isn't):

FNMSKH=${FNMSKH:-"${FIXgfs}/am/global_slmask.t1534.3072.1536.grb"}

https://github.com/NOAA-EMC/global-workflow/blob/develop/ush/forecast_postdet.sh#L315

The default for global_soilmgldas (when file doesn't exist) and the current setting for global_slmask both end up with the highest resolution file for all resolutions except C48 for the global_soilmgldas file. This leans towards option 1.

yangfanglin commented 7 months ago

There are a lot of history left behind about how these files were created for the global spectral model (old GFS) at different resolutions and with Eulerian or semi-Lagrangian advection. It can be always confusing. I'd still suggest using the highest resolution file for all FV3 resolutions to reduce cluttering and to avoid confusion. Alternatively, we can follow what George Gayno had done for other boundary conditions, that is, to create soilmgldas for each FV3 resolution on the tiles (using global_soilmgldas.statsgo.t1534.3072.1536.grb as the source for all cases).

KateFriedman-NOAA commented 7 months ago

I'd still suggest using the highest resolution file for all FV3 resolutions to reduce cluttering and to avoid confusion.

This would be my vote too (coming from the file management side of things). @yangfanglin Do you suggest the same thing for the slmask files too?

@GeorgeGayno-NOAA thoughts?

yangfanglin commented 7 months ago

@KateFriedman-NOAA we need to keep slmask separately for different model resolution.

barlage commented 7 months ago

@KateFriedman-NOAA You said "All other resolutions use the highest resolution file.", but wouldn't say C192 find global_soilmgldas.statsgo.t382.768.384.grb? And similar for C384 (global_soilmgldas.statsgo.t766.1536.768.grb) and C768 (global_soilmgldas.statsgo.t1534.3072.1536.grb)? C96 seems to be the exception.

That said, I tend to agree with @yangfanglin re: using just the highest resolution. My question would be to the people who have been around longer than I have(@GeorgeGayno-NOAA @HelinWei-NOAA ): are these soilmgldas grb data created differently, i.e., was a separate climatology calculated on each gaussian grid or are they all just regridded from single resolution source?

@GeorgeGayno-NOAA are these data reprojected to the tiles using nearest neighbor? if so, I could see that as a motivation to keep different resolutions, if the answer to above is that they are created specifically for these resolutions (i.e., totally separate climatologies). Or are these different resolution grb files using some conservative regridding from the high res source so that nearest neighbor produces less error?

The issue re: slmask is probably broader and should be taken up in a different issue, e.g., why do we even need any of these grb data that are also put onto the tiles in sfc_climo_gen? Keeping these grb versions seems unnecessarily confusing for users to know what data are actually used in the model.

yangfanglin commented 7 months ago

The issue re: slmask is probably broader and should be taken up in a different issues, e.g., why do we even need any of these grb data that are also put onto the tiles in sfc_climo_gen? Keeping these grb versions seems unnecessarily confusing for users to know what data are actually used in the model.

Mike, I agree we should get rid off these slmask files since slmask on the tiles can be found in fix/orog/CASE/oro files.

I think those soilmgldas files were created from the same source back in 2015 or so from offline GLDAS run at T576 Eulerian (bilinear) gaussian grid resolution.

KateFriedman-NOAA commented 7 months ago

@KateFriedman-NOAA You said "All other resolutions use the highest resolution file.", but wouldn't say C192 find global_soilmgldas.statsgo.t382.768.384.grb? And similar for C384 (global_soilmgldas.statsgo.t766.1536.768.grb) and C768 (global_soilmgldas.statsgo.t1534.3072.1536.grb)? C96 seems to be the exception.

@barlage My bad, you are correct about C192 and C384. So the lowest resolutions (C48 and C96) don't map then...C48 maps to a symlink that maps to a file and C96's file doesn't currently exist (it would be global_soilmgldas.statsgo.t190.384.192.grb if I did the math right).

Let me know what you all decide for both files and we can make needed changes in the staged set and workflow scripts.

GeorgeGayno-NOAA commented 7 months ago

The issue re: slmask is probably broader and should be taken up in a different issues, e.g., why do we even need any of these grb data that are also put onto the tiles in sfc_climo_gen? Keeping these grb versions seems unnecessarily confusing for users to know what data are actually used in the model.

Mike, I agree we should get rid off these slmask files since slmask on the tiles can be found in fix/orog/CASE/oro files.

I think those soilmgldas files were created from the same source back in 2015 or so from offline GLDAS run at T576 Eulerian (bilinear) gaussian grid resolution.

@JesseMeng-NOAA created the soilmgldas files. I believe he created the different resolutions by interpolating from the gldas native resolution. I recall he used an interpolation method that specially handled land states - nearest neighbor, accounting for differing soil types, etc.

GeorgeGayno-NOAA commented 7 months ago

For the global_slmask file, we have the following available (not showing the rg or f77 slmask files):

[role.glopara@hfe06 20220805]$ ll global_slmask.t* | grep -v rg | grep -v f77
-rw-r--r-- 1 role.glopara global  1327188 May  4  2017 global_slmask.t1148.2304.1152.grb
-rw-r--r-- 1 role.glopara global    36564 May  4  2017 global_slmask.t126.384.190.grb
-rw-r--r-- 1 role.glopara global  2359380 May  4  2017 global_slmask.t1534.3072.1536.grb
-rw-r--r-- 1 role.glopara global    65620 May  4  2017 global_slmask.t170.512.256.grb
lrwxrwxrwx 1 role.glopara global       30 Dec 20  2017 global_slmask.t190.384.192.grb -> global_slmask.t126.384.190.grb
-rw-r--r-- 1 role.glopara global    83028 May  4  2017 global_slmask.t190.576.288.grb
-rw-r--r-- 1 role.glopara global    16468 May  4  2017 global_slmask.t254.512.256.grb
-rw-r--r-- 1 role.glopara global   147540 May  4  2017 global_slmask.t254.768.384.grb
-rw-r--r-- 1 role.glopara global   331860 May  4  2017 global_slmask.t382.1152.576.grb
-rw-r--r-- 1 role.glopara global   147540 May  4  2017 global_slmask.t382.768.384.grb
-rw-r--r-- 1 role.glopara global   331860 May  4  2017 global_slmask.t574.1152.576.grb
-rw-r--r-- 1 role.glopara global   774484 May  4  2017 global_slmask.t574.1760.880.grb
-rw-r--r-- 1 role.glopara global     9108 May  4  2017 global_slmask.t62.192.94.grb
-rw-r--r-- 1 role.glopara global   112980 May  4  2017 global_slmask.t670.1344.672.grb
-rw-r--r-- 1 role.glopara global   147540 Oct 13  2017 global_slmask.t766.1536.768.grb
-rw-r--r-- 1 role.glopara global   774484 May  4  2017 global_slmask.t878.1760.880.grb
-rw-r--r-- 1 role.glopara global  1742484 May  4  2017 global_slmask.t878.2640.1320.grb
-rw-r--r-- 1 role.glopara global     9108 May  4  2017 global_slmask.t92.192.94.grb

...but ush/forecast_postdet.sh uses the highest resolution file (if not set from above, which it generally isn't):

FNMSKH=${FNMSKH:-"${FIXgfs}/am/global_slmask.t1534.3072.1536.grb"}

https://github.com/NOAA-EMC/global-workflow/blob/develop/ush/forecast_postdet.sh#L315

The default for global_soilmgldas (when file doesn't exist) and the current setting for global_slmask both end up with the highest resolution file for all resolutions except C48 for the global_soilmgldas file. This leans towards option 1.

The land mask files, passed to global cycle as FNMSKH, have nothing to do with the model land mask. Rather, that file is used to create a land mask for certain input data. For example, some data input to cycle - such as ./fix/am/global_glacier.2x2.grb - does not have a bit map. That means there is no standard way to know if a point in global_glacier.2x2.grb is land or not. In that case, FNMSKH is interpolated to the global_glacier.2x2.grb resolution and is used as a proxy mask. Most data in ./fix/am have bit maps. If the data has a bitmap, then it is used instead of an interpolated FNMSKH. And FNMSKH is not needed for the new tiled versions of the surface data because they are already mapped to the model grid. Given how this mask data is used, we should just keep the highest res version of FNMSKH.