Open SadeghTabas-NOAA opened 1 year ago
This isn't actually a fix file request because the files are already in fix.
For the first file, these are identical. The fix directory has symlinks for all the t94.192.96
files to the t92.192.94
files because one of the names is incorrect. Given all gw names use the same calculation to determine those names from the model grid, I'm going to say the one workflow is using (t94.192.96
) is probably correct.
For slmask
, we just need to update the files being used to depend on resolution instead of always using the high-res version.
@WalterKolczynski-NOAA Thanks for confirming that the global_soilmgldas.statsgo.t94.192.94.grb file is actually linked to global_soilmgldas.statsgo.t92.192.94.grb. Maybe we can ask land team's help to confirm the file names.
@HelinWei-NOAA @barlage @yangfanglin could we rename global_soilmgldas.statsgo.t92.192.94.grb to global_soilmgldas.statsgo.t94.192.94.grb?
@junwang-noaa All soil moisture data were created by both @GeorgeGayno-NOAA and Jesse Meng. If you look at this directory /scratch1/NCEPDEV/global/glopara/fix/am/20220805, there are more fixed files for the grid t92.192.94. So I am not sure if t94.192.96 is the correct one.
Thanks for checking it, Helin. @GeorgeGayno-NOAA may I ask your help here?
All of the resolutions in workflow use the following formulas to calculate these filenames:
JCAP_CASE=$((res*2-2)) # C48 -> 94
LATB_CASE=$((res*2)) # C48 -> 96
I'm not sure why the formula would work for every resolution except C48. Plus I think we had this discussion six months ago or whenever we first made these symlinks.
I just searched through my email, and @yangfanglin is the one who suggested we use the high-res soilmgldas for all resolutions:
Kate, Since interpolation to the tiles will nevertheless be done for the FV3 model, I'd recommend using global_soilmgldas.statsgo.t1534.3072.1536.grb for all fv3 resolutions two years ago. The reason we created the fix files for different resolutions was to save time for running the spectral model. Fanglin
Circling back looking to resolve this
t92.192.94
or t94.192.96
? The latter seems to be correct to me based on the formula GW uses to calculate all other resolutions, but I have yet to see consensus. (In fix
, one is symlinked to the other, so the file is identical either way.)@barlage @junwang-noaa @HelinWei-NOAA @GeorgeGayno-NOAA
Retagging folks since this went stale: @barlage @junwang-noaa @HelinWei-NOAA @GeorgeGayno-NOAA
Retagging folks again @barlage @junwang-noaa @HelinWei-NOAA @GeorgeGayno-NOAA.
What should be done for this?
Circling back looking to resolve this
Should GW continue using the high-res soilmgldas file for all FV3 resolutions as @yangfanglin suggested?
- If so, can we remove the others from future versions of fix to free up a bit of space (and remove clutter)?
- If so, presumably UFS RTs should also switch to using only the high-res files.
- What is the appropriate name for the C48 file:
t92.192.94
ort94.192.96
? The latter seems to be correct to me based on the formula GW uses to calculate all other resolutions, but I have yet to see consensus. (Infix
, one is symlinked to the other, so the file is identical either way.)@barlage @junwang-noaa @HelinWei-NOAA @GeorgeGayno-NOAA
The filenames are based on the dimensions of the data. So, to be consistent with the other 'gldas' files, the C48 file would be t92.192.94.
agree with @GeorgeGayno-NOAA
The filenames are based on the dimensions of the data. So, to be consistent with the other 'gldas' files, the C48 file would be t92.192.94.
@GeorgeGayno-NOAA @HelinWei-NOAA Understood, however, the internal calculations for the global_soilmgldas
fix files can't get us that filename. As Walter mentioned before (https://github.com/NOAA-EMC/global-workflow/issues/1723#issuecomment-1629287368), here are the internal calculations in ush/forecast_postdet.sh
currently:
res="${CASE:1}"
JCAP_CASE=$((2*res-2))
LONB_CASE=$((4*res))
LATB_CASE=$((2*res))
JCAP=${JCAP:-${JCAP_CASE}}
LONB=${LONB:-${LONB_CASE}}
LATB=${LATB:-${LATB_CASE}}
FNSMCC=${FNSMCC:-"${FIXgfs}/am/global_soilmgldas.statsgo.t${JCAP}.${LONB}.${LATB}.grb"}
When running C48 the above will end up looking for global_soilmgldas.statsgo.t94.192.96.grb
. As observed via a C48 enkf fcst job log:
+ forecast_postdet.sh[278]: res=48
...
+ forecast_postdet.sh[287]: JCAP_CASE=94
+ forecast_postdet.sh[288]: LONB_CASE=192
+ forecast_postdet.sh[289]: LATB_CASE=96
+ forecast_postdet.sh[291]: JCAP=94
+ forecast_postdet.sh[292]: LONB=192
+ forecast_postdet.sh[293]: LATB=96
...
+ forecast_postdet.sh[317]: FNSMCC=/scratch1/NCEPDEV/global/Kate.Friedman/git/develop/fix/am/global_soilmgldas.statsgo.t94.192.96.grb
+ forecast_postdet.sh[320]: [[ ! -f /scratch1/NCEPDEV/global/Kate.Friedman/git/develop/fix/am/global_soilmgldas.statsgo.t94.192.96.grb ]]
It only finds that file, however, because we currently have a symlink for that filename to the t92.192.94
one that @GeorgeGayno-NOAA indicated:
[role.glopara@hfe06 20220805]$ ll global_soilmgldas.statsgo*
-rw-r--r-- 1 role.glopara global 111393120 Feb 11 2018 global_soilmgldas.statsgo.t1534.3072.1536.grb
-rw-r--r-- 1 role.glopara global 3100512 Feb 11 2018 global_soilmgldas.statsgo.t254.512.256.grb
-rw-r--r-- 1 role.glopara global 15770592 Feb 11 2018 global_soilmgldas.statsgo.t382.1152.576.grb
-rw-r--r-- 1 role.glopara global 6978720 Feb 11 2018 global_soilmgldas.statsgo.t382.768.384.grb
-rw-r--r-- 1 role.glopara global 15705792 Feb 11 2018 global_soilmgldas.statsgo.t574.1152.576.grb
-rw-r--r-- 1 role.glopara global 27775872 Feb 11 2018 global_soilmgldas.statsgo.t766.1536.768.grb
-rw-r--r-- 1 role.glopara global 435360 Feb 11 2018 global_soilmgldas.statsgo.t92.192.94.grb
lrwxrwxrwx 1 role.glopara global 40 Jun 27 2018 global_soilmgldas.statsgo.t94.192.96.grb -> global_soilmgldas.statsgo.t92.192.94.grb
Looking at other resolution forecast jobs (e.g. C96) you end up with global_soilmgldas.statsgo.t190.384.192.grb
, but that file or a symlink equivalent do not exist so the script defaults to the highest resolution file:
+ forecast_postdet.sh[278]: res=96
...
+ forecast_postdet.sh[287]: JCAP_CASE=190
+ forecast_postdet.sh[288]: LONB_CASE=384
+ forecast_postdet.sh[289]: LATB_CASE=192
+ forecast_postdet.sh[291]: JCAP=190
+ forecast_postdet.sh[292]: LONB=384
+ forecast_postdet.sh[293]: LATB=192
...
+ forecast_postdet.sh[317]: FNSMCC=/scratch1/NCEPDEV/global/Kate.Friedman/git/develop/fix/am/global_soilmgldas.statsgo.t190.384.192.grb
+ forecast_postdet.sh[320]: [[ ! -f /scratch1/NCEPDEV/global/Kate.Friedman/git/develop/fix/am/global_soilmgldas.statsgo.t190.384.192.grb ]]
+ forecast_postdet.sh[320]: FNSMCC=/scratch1/NCEPDEV/global/Kate.Friedman/git/develop/fix/am/global_soilmgldas.statsgo.t1534.3072.1536.grb
Only C48 actually uses a lower resolution soilmgldas.statsgo file, because of the existing symlink. All other resolutions use the highest resolution file. If we removed that symlink, the C48 wouldn't find the file and would default to the highest resolution one as well.
So we either need to: 1) use the highest resolution file for C48 as well (as @yangfanglin suggested) 2) rename all of the files to match the internal filename calculation 3) adjust the internal filename calculation to use the existing filenames
Please let us know which option, thanks!
I should add, @SadeghTabas-NOAA noted that:
ufswm uses global_soilmgldas.statsgo.t92.192.94.grb
...which would lean towards option 3, if we want to be consistent with UWM and not use the highest resolution file for all resolutions.
For the global_slmask
file, we have the following available (not showing the rg
or f77
slmask files):
[role.glopara@hfe06 20220805]$ ll global_slmask.t* | grep -v rg | grep -v f77
-rw-r--r-- 1 role.glopara global 1327188 May 4 2017 global_slmask.t1148.2304.1152.grb
-rw-r--r-- 1 role.glopara global 36564 May 4 2017 global_slmask.t126.384.190.grb
-rw-r--r-- 1 role.glopara global 2359380 May 4 2017 global_slmask.t1534.3072.1536.grb
-rw-r--r-- 1 role.glopara global 65620 May 4 2017 global_slmask.t170.512.256.grb
lrwxrwxrwx 1 role.glopara global 30 Dec 20 2017 global_slmask.t190.384.192.grb -> global_slmask.t126.384.190.grb
-rw-r--r-- 1 role.glopara global 83028 May 4 2017 global_slmask.t190.576.288.grb
-rw-r--r-- 1 role.glopara global 16468 May 4 2017 global_slmask.t254.512.256.grb
-rw-r--r-- 1 role.glopara global 147540 May 4 2017 global_slmask.t254.768.384.grb
-rw-r--r-- 1 role.glopara global 331860 May 4 2017 global_slmask.t382.1152.576.grb
-rw-r--r-- 1 role.glopara global 147540 May 4 2017 global_slmask.t382.768.384.grb
-rw-r--r-- 1 role.glopara global 331860 May 4 2017 global_slmask.t574.1152.576.grb
-rw-r--r-- 1 role.glopara global 774484 May 4 2017 global_slmask.t574.1760.880.grb
-rw-r--r-- 1 role.glopara global 9108 May 4 2017 global_slmask.t62.192.94.grb
-rw-r--r-- 1 role.glopara global 112980 May 4 2017 global_slmask.t670.1344.672.grb
-rw-r--r-- 1 role.glopara global 147540 Oct 13 2017 global_slmask.t766.1536.768.grb
-rw-r--r-- 1 role.glopara global 774484 May 4 2017 global_slmask.t878.1760.880.grb
-rw-r--r-- 1 role.glopara global 1742484 May 4 2017 global_slmask.t878.2640.1320.grb
-rw-r--r-- 1 role.glopara global 9108 May 4 2017 global_slmask.t92.192.94.grb
...but ush/forecast_postdet.sh
uses the highest resolution file (if not set from above, which it generally isn't):
FNMSKH=${FNMSKH:-"${FIXgfs}/am/global_slmask.t1534.3072.1536.grb"}
https://github.com/NOAA-EMC/global-workflow/blob/develop/ush/forecast_postdet.sh#L315
The default for global_soilmgldas
(when file doesn't exist) and the current setting for global_slmask
both end up with the highest resolution file for all resolutions except C48 for the global_soilmgldas
file. This leans towards option 1.
There are a lot of history left behind about how these files were created for the global spectral model (old GFS) at different resolutions and with Eulerian or semi-Lagrangian advection. It can be always confusing. I'd still suggest using the highest resolution file for all FV3 resolutions to reduce cluttering and to avoid confusion. Alternatively, we can follow what George Gayno had done for other boundary conditions, that is, to create soilmgldas for each FV3 resolution on the tiles (using global_soilmgldas.statsgo.t1534.3072.1536.grb as the source for all cases).
I'd still suggest using the highest resolution file for all FV3 resolutions to reduce cluttering and to avoid confusion.
This would be my vote too (coming from the file management side of things). @yangfanglin Do you suggest the same thing for the slmask files too?
@GeorgeGayno-NOAA thoughts?
@KateFriedman-NOAA we need to keep slmask separately for different model resolution.
@KateFriedman-NOAA You said "All other resolutions use the highest resolution file.", but wouldn't say C192 find global_soilmgldas.statsgo.t382.768.384.grb
? And similar for C384 (global_soilmgldas.statsgo.t766.1536.768.grb
) and C768 (global_soilmgldas.statsgo.t1534.3072.1536.grb
)? C96 seems to be the exception.
That said, I tend to agree with @yangfanglin re: using just the highest resolution. My question would be to the people who have been around longer than I have(@GeorgeGayno-NOAA @HelinWei-NOAA ): are these soilmgldas grb data created differently, i.e., was a separate climatology calculated on each gaussian grid or are they all just regridded from single resolution source?
@GeorgeGayno-NOAA are these data reprojected to the tiles using nearest neighbor? if so, I could see that as a motivation to keep different resolutions, if the answer to above is that they are created specifically for these resolutions (i.e., totally separate climatologies). Or are these different resolution grb files using some conservative regridding from the high res source so that nearest neighbor produces less error?
The issue re: slmask is probably broader and should be taken up in a different issue, e.g., why do we even need any of these grb data that are also put onto the tiles in sfc_climo_gen
? Keeping these grb versions seems unnecessarily confusing for users to know what data are actually used in the model.
The issue re: slmask is probably broader and should be taken up in a different issues, e.g., why do we even need any of these grb data that are also put onto the tiles in
sfc_climo_gen
? Keeping these grb versions seems unnecessarily confusing for users to know what data are actually used in the model.
Mike, I agree we should get rid off these slmask files since slmask on the tiles can be found in fix/orog/CASE/oro files.
I think those soilmgldas files were created from the same source back in 2015 or so from offline GLDAS run at T576 Eulerian (bilinear) gaussian grid resolution.
@KateFriedman-NOAA You said "All other resolutions use the highest resolution file.", but wouldn't say C192 find global_soilmgldas.statsgo.t382.768.384.grb? And similar for C384 (global_soilmgldas.statsgo.t766.1536.768.grb) and C768 (global_soilmgldas.statsgo.t1534.3072.1536.grb)? C96 seems to be the exception.
@barlage My bad, you are correct about C192 and C384. So the lowest resolutions (C48 and C96) don't map then...C48 maps to a symlink that maps to a file and C96's file doesn't currently exist (it would be global_soilmgldas.statsgo.t190.384.192.grb
if I did the math right).
Let me know what you all decide for both files and we can make needed changes in the staged set and workflow scripts.
The issue re: slmask is probably broader and should be taken up in a different issues, e.g., why do we even need any of these grb data that are also put onto the tiles in
sfc_climo_gen
? Keeping these grb versions seems unnecessarily confusing for users to know what data are actually used in the model.Mike, I agree we should get rid off these slmask files since slmask on the tiles can be found in fix/orog/CASE/oro files.
I think those soilmgldas files were created from the same source back in 2015 or so from offline GLDAS run at T576 Eulerian (bilinear) gaussian grid resolution.
@JesseMeng-NOAA created the soilmgldas files. I believe he created the different resolutions by interpolating from the gldas native resolution. I recall he used an interpolation method that specially handled land states - nearest neighbor, accounting for differing soil types, etc.
For the
global_slmask
file, we have the following available (not showing therg
orf77
slmask files):[role.glopara@hfe06 20220805]$ ll global_slmask.t* | grep -v rg | grep -v f77 -rw-r--r-- 1 role.glopara global 1327188 May 4 2017 global_slmask.t1148.2304.1152.grb -rw-r--r-- 1 role.glopara global 36564 May 4 2017 global_slmask.t126.384.190.grb -rw-r--r-- 1 role.glopara global 2359380 May 4 2017 global_slmask.t1534.3072.1536.grb -rw-r--r-- 1 role.glopara global 65620 May 4 2017 global_slmask.t170.512.256.grb lrwxrwxrwx 1 role.glopara global 30 Dec 20 2017 global_slmask.t190.384.192.grb -> global_slmask.t126.384.190.grb -rw-r--r-- 1 role.glopara global 83028 May 4 2017 global_slmask.t190.576.288.grb -rw-r--r-- 1 role.glopara global 16468 May 4 2017 global_slmask.t254.512.256.grb -rw-r--r-- 1 role.glopara global 147540 May 4 2017 global_slmask.t254.768.384.grb -rw-r--r-- 1 role.glopara global 331860 May 4 2017 global_slmask.t382.1152.576.grb -rw-r--r-- 1 role.glopara global 147540 May 4 2017 global_slmask.t382.768.384.grb -rw-r--r-- 1 role.glopara global 331860 May 4 2017 global_slmask.t574.1152.576.grb -rw-r--r-- 1 role.glopara global 774484 May 4 2017 global_slmask.t574.1760.880.grb -rw-r--r-- 1 role.glopara global 9108 May 4 2017 global_slmask.t62.192.94.grb -rw-r--r-- 1 role.glopara global 112980 May 4 2017 global_slmask.t670.1344.672.grb -rw-r--r-- 1 role.glopara global 147540 Oct 13 2017 global_slmask.t766.1536.768.grb -rw-r--r-- 1 role.glopara global 774484 May 4 2017 global_slmask.t878.1760.880.grb -rw-r--r-- 1 role.glopara global 1742484 May 4 2017 global_slmask.t878.2640.1320.grb -rw-r--r-- 1 role.glopara global 9108 May 4 2017 global_slmask.t92.192.94.grb
...but
ush/forecast_postdet.sh
uses the highest resolution file (if not set from above, which it generally isn't):FNMSKH=${FNMSKH:-"${FIXgfs}/am/global_slmask.t1534.3072.1536.grb"}
https://github.com/NOAA-EMC/global-workflow/blob/develop/ush/forecast_postdet.sh#L315
The default for
global_soilmgldas
(when file doesn't exist) and the current setting forglobal_slmask
both end up with the highest resolution file for all resolutions except C48 for theglobal_soilmgldas
file. This leans towards option 1.
The land mask files, passed to global cycle as FNMSKH, have nothing to do with the model land mask. Rather, that file is used to create a land mask for certain input data. For example, some data input to cycle - such as ./fix/am/global_glacier.2x2.grb
- does not have a bit map. That means there is no standard way to know if a point in global_glacier.2x2.grb
is land or not. In that case, FNMSKH is interpolated to the global_glacier.2x2.grb resolution and is used as a proxy mask. Most data in ./fix/am
have bit maps. If the data has a bitmap, then it is used instead of an interpolated FNMSKH. And FNMSKH is not needed for the new tiled versions of the surface data because they are already mapped to the model grid. Given how this mask data is used, we should just keep the highest res version of FNMSKH.
Description
Global workflow uses
global_soilmgldas.statsgo.t94.192.96.grb
andglobal_slmask.t1534.3072.1536.grb
for the test c48 (ATM only). On the other side, ufswm usesglobal_soilmgldas.statsgo.t92.192.94.grb
andglobal_slmask.t62.192.94.grb
instead. As the c48 is a low resolution test, low resolution fix files need to be used in order to be matched with ufswm.Tasks