Closed JessicaMeixner-NOAA closed 1 week ago
I compared tripole.mx025.Ct.to.rect.1p00.conserve.nc between the two, looks there is a 360 offset between them: xc_a = -299.718339695101, -299.47037035674, -299.22239891217 <-- HERA xc_a = 60.2816603048989, 60.5296296432605, 60.7776010878256 <--wcoss2
@EricSinsky-NOAA can we re-run the exectuable here offline ? /scratch1/NCEPDEV/climate/Jiande.Wang/working/scratch/ocean-zero-value/oceanice_products.3448181 what kind of module do we need to load ? I tried but got error ./ocnicepost.x: symbol lookup error: ./ocnicepost.x: undefined symbol: netcdf_mp_nf90open
but I do have netcdf4 and hdf5 module loaded
@jiandewang Good find. It looks like there is a 360 offset between the 20231219 version and the 20240416 version of these fix files. These can be both found on Hera:
Version used in HR3 (20231219): /scratch1/NCEPDEV/global/glopara/fix/mom6/20231219/post/mx025/tripole.mx025.Ct.to.rect.1p00.conserve.nc
Newer version (20240416): /scratch1/NCEPDEV/global/glopara/fix/mom6/20240416/post/mx025/tripole.mx025.Ct.to.rect.1p00.conserve.nc
@EricSinsky-NOAA can we re-run the exectuable here offline ? /scratch1/NCEPDEV/climate/Jiande.Wang/working/scratch/ocean-zero-value/oceanice_products.3448181 what kind of module do we need to load ? I tried but got error ./ocnicepost.x: symbol lookup error: ./ocnicepost.x: undefined symbol: netcdf_mp_nf90open
but I do have netcdf4 and hdf5 module loaded
I have ran ocnicepost.x offline before, but it has been a couple of months.
@jiandewang I would start by executing source ush/load_fv3gfs_modules.sh
before running ocnicepost.x offline.
@EricSinsky-NOAA what's wrong in what I did below ? why it added an extar "/" before "ush"
cd /scratch1/NCEPDEV/climate/Jiande.Wang/working/scratch/ocean-zero-value/global-workflow source ush/load_fv3gfs_modules.sh Loading modules quietly... -bash: /ush/detect_machine.sh: No such file or directory -bash: /ush/module-setup.sh: No such file or directory -bash: /versions/run.ver: No such file or directory WARNING: UNKNOWN PLATFORM No modules loaded
@jiandewang I am getting the same error too when I try to load modules using load_fv3gfs_modules.sh. However, I did a quick test in /lfs/h2/emc/stmp/eric.sinsky/RUNDIRS/gw_ocnbugfix2/oceanice_products.242828
and was able to execute ocnicepost.x offline. These are the modules I have loaded
@EricSinsky-NOAA what's wrong in what I did below ? why it added an extar "/" before "ush"
cd /scratch1/NCEPDEV/climate/Jiande.Wang/working/scratch/ocean-zero-value/global-workflow source ush/load_fv3gfs_modules.sh Loading modules quietly... -bash: /ush/detect_machine.sh: No such file or directory -bash: /ush/module-setup.sh: No such file or directory -bash: /versions/run.ver: No such file or directory WARNING: UNKNOWN PLATFORM No modules loaded
Do this first:
export HOMEgfs="/scratch1/NCEPDEV/climate/Jiande.Wang/working/scratch/ocean-zero-value/global-workflow"
@jiandewang I am getting the same error too when I try to load modules using load_fv3gfs_modules.sh. However, I did a quick test in
/lfs/h2/emc/stmp/eric.sinsky/RUNDIRS/gw_ocnbugfix2/oceanice_products.242828
and was able to execute ocnicepost.x offline. These are the modules I have loaded
@EricSinsky-NOAA can you copy and paste your module list here so that I can do copy and paste ?
craype-x86-rome libfabric/1.11.0.0. craype-network-ofi envvar/1.0 intel/19.1.3.304 PrgEnv-intel/8.1.0 imagemagick/7.0.8-7 subversion/1.14.0 libjpeg/9c grib_util/1.2.2 wgrib2/2.0.8_wmo GrADS/2.2.2 ecflow/5.6.0.11 cdo/1.9.8 udunits/2.2.28 ncview/2.1.7 python/3.8.6 proj/7.1.0 geos/3.8.1 prod_util/2.0.14 w3nco/2.4.1 core/rocoto/1.3.5 hdf5/1.10.6 netcdf/4.7.4
@EricSinsky-NOAA I see you are testing on wcoss2. Can you repeat your testing on HERA but use the following as a template ? /scratch1/NCEPDEV/climate/Jiande.Wang/working/scratch/ocean-zero-value/oceanice_products.3448181
@jiandewang I just ran ocnicepost.x offline on Hera using your template. The interpolated output can be found here: /scratch2/NCEPDEV/ensemble/noscrub/Eric.Sinsky/ocnpost_bugfix/oceanice_products.3448181/ocean.0p25.nc
@jiandewang I just ran ocnicepost.x offline on Hera using your template. The interpolated output can be found here: /scratch2/NCEPDEV/ensemble/noscrub/Eric.Sinsky/ocnpost_bugfix/oceanice_products.3448181/ocean.0p25.nc
can you share me your module list on HERA ?
also can you replace fixed file with /scratch2/NCEPDEV/ensemble/noscrub/Eric.Sinsky/ocnpost_bugfix/oceanice_products.3448181/fixed-file-wcoss2 and re-run it ?
@jiandewang I just ran ocnicepost.x offline on Hera using your template. The interpolated output can be found here: /scratch2/NCEPDEV/ensemble/noscrub/Eric.Sinsky/ocnpost_bugfix/oceanice_products.3448181/ocean.0p25.nc
can you share me your module list on HERA ?
also can you replace fixed file with /scratch2/NCEPDEV/ensemble/noscrub/Eric.Sinsky/ocnpost_bugfix/oceanice_products.3448181/fixed-file-wcoss2 and re-run it ?
@jiandewang If you export HOMEgfs
first (see above), load_fv3gfs_modules.sh
should work
@jiandewang I just ran ocnicepost.x offline on Hera using your template. The interpolated output can be found here: /scratch2/NCEPDEV/ensemble/noscrub/Eric.Sinsky/ocnpost_bugfix/oceanice_products.3448181/ocean.0p25.nc
can you share me your module list on HERA ? also can you replace fixed file with /scratch2/NCEPDEV/ensemble/noscrub/Eric.Sinsky/ocnpost_bugfix/oceanice_products.3448181/fixed-file-wcoss2 and re-run it ?
@jiandewang If you export
HOMEgfs
first (see above),load_fv3gfs_modules.sh
should work
@WalterKolczynski-NOAA no more module loading error after I did export HOMEgfs=.... Thanks
Thanks @WalterKolczynski-NOAA. Adding HOMEgfs to my environment allowed me to successfully execute load_fv3gfs_modules.sh.
@jiandewang After replacing the fix files with /scratch2/NCEPDEV/ensemble/noscrub/Eric.Sinsky/ocnpost_bugfix/oceanice_products.3448181/fixed-file-wcoss2 and rerunning, I am still getting all zeroes.
My test run of C48 on wcoss2 did not do well: /lfs/h2/emc/couple/noscrub/jessica.meixner/testoceanpost/hr3/test01/COMROOT/c48t01/gfs.20210323/12/products/ocean/grib2/5p00
Thank you, @JessicaMeixner-NOAA. It sounds like this might be an issue with the build of ocnicepost.x on WCOSS2 and Hera. @jiandewang When you ran your HR3 test and you got reasonable interpolated ocean output, did you rebuild ocnicepost.x (as well as the other executables related to HR3) during your test?
Thank you, @JessicaMeixner-NOAA. It sounds like this might be an issue with the build of ocnicepost.x on WCOSS2 and Hera. @jiandewang When you ran your HR3 test and you got reasonable interpolated ocean output, did you rebuild ocnicepost.x (as well as the other executables related to HR3) during your test?
no I just used my original several month ago's *.x
I did a new build, but I did have an old build too... I'll try the 0.25 case w/the new build and I'll also try using my old build on a C48 case and see what happens.
Update:
Therefore, I think there are likely issues with all of the 5 deg cases and so we should not be using that to see if things are working or not.
@JessicaMeixner-NOAA Glad to see you are getting non-zeroes for C768mx025. Were the C768mx025 test cases also based on the HR3 tag (not just the C48mx500 test case)? Also did you run the C768mx025 test case using both your old build and new build too?
Also, I ran an old version of ocnicepost offline. I got non-zeroes in the interpolated NetCDF output. In this test, however, the resolution of the NetCDF input (MOM6) data was mx025.
@EricSinsky-NOAA It is nice to see some non-zero values, for sure!!
The tests I ran with the HR3 tag, I ran both the old build and the new build and both had non-zeros.
This is my understanding on what we know so far:
@EricSinsky-NOAA I'd say that we get zero's with the newest hashes, where the mx025 issues come in between now and HR3 tag is an open question I think, since most of our previous testing was based on mx500, I'm not sure we have a lot of information about the in-between parts. I'm going to run a few tests on WCOSS2 to see if we can narrow down issues there.
Thank you @EricSinsky-NOAA for the summary and @JessicaMeixner-NOAA for the additional information.
A few questions:
C768mx025 case with the HR3 tag
can you drop the date of the fix files (interpolation weights) being used? ocnicepost.x
by replacing these weights w/ the develop
version, do we get non-zero result?I'ld say we need to find a baseline that works first; I think we have that for C768mx025 case with the HR3 tag
. Unfortunately C48mx500 with the HR3 tag
resulted in zeros.
For the HR3 tag on WCOSS2 the mom6 fix files are:
mom6 -> /lfs/h2/emc/global/noscrub/emc.global/FIX/fix/mom6/20231219
I'm currently trying to test the commit before the fix file change on wcoss2 with mx025 to see if that works. I did find an experiment on hera that a case using the old fix files and mx025 still gave me zeros...
I ran with mx025 on WCOSS2 for commit hashes https://github.com/NOAA-EMC/global-workflow/commit/6ca106e6c0466d7165fc37b147e0e2735a1d6a0b and https://github.com/NOAA-EMC/global-workflow/commit/d5366c66bd67f89d118b18956fe230207cbf0aea (the one that changed the mom6 fix) and they both give me non-zero output for the grib2 files....
I can share paths if that's helpful. Has anyone tried anything mx025 on orion?
So some random thoughts before the weekend:
- Did we ever confirm that the reason for the diffs between wcoss2 and hera that @jiandewang saw were because of version numbers or were there actually differences?
@JessicaMeixner-NOAA The diffs between WCOSS2 and Hera are because the comparisons were between two different versions of the fix files. The fix files being compared from WCOSS2 are the 20231219 version, while the fix files being compared from Hera are the 20240416 version. Both fix file versions exist on both WCOSS2 and Hera. When the fix files of the same version are compared between WCOSS2 and Hera, the file sizes are identical.
@EricSinsky-NOAA thanks for confirming that!
some further testing results: (1) The fix files 20231219 version vs 20240416 version: there is a 360 degree offset in longitute between them. The results generated by them are not identical but differences are on roundoff level (~E-8). So this is not the reason for the zero value in regular grid file.
(2) in HR3 run on wcoss2 which gave us correct results, ocean master files are on 40 levels. However in Jessica's HERA run (/scratch1/NCEPDEV/climate/Jessica.Meixner/cycling/iau_06/C384iaucold03/TMP/RUNDIRS/cold03/oceanice_products.3448181) and Eric's run, ocean.nc are on 75 levels because you are setting as DA
see https://github.com/NOAA-EMC/global-workflow/blob/develop/parm/config/gfs/config.ufs#L454-L459 I used Jessica's run dir as template but replaced ocean.nc by the one from HR3 run (40L), then it generated correct regular grid file.
more testing results: It is the missing value that messed up the results. In HR3 run it is -e34 while in DA it is set as 0. After I re-set missing value to -e34 in ocean.nc from Jessica's run dir, the interpolated results are correct. I think this missing value is embeded in fixed files when they were generated using one of previous HRx run output where it is -e34. I did my test on wcoss2. Somehow I had trouble to run it on HERA due to module loading.
@EricSinsky-NOAA : you may repeat your run but use my modified input file at /scratch1/NCEPDEV/climate/Jiande.Wang/working/scratch/ocean-zero-value/ceanice_products.3448181-JM/NCO2/ocean.nc-JM-75L-E34 or you can simply repeat your C48mx500 run but set https://github.com/NOAA-EMC/global-workflow/blob/develop/parm/config/gfs/config.ufs#L456C9-L456C31 as -e34
@jiandewang Thank you very much for finding the issue! I just ran the C48_S2SWA_gefs CI test case (MOM6 is set to mx500) using the most recent hash. I have set MOM6_DIAG_MISVAL
to -1e34
in parm/config/gefs/config.ufs
and this fixed the issue (non-zeroes in the interpolated ocean output).
EDIT: My test was on WCOSS2.
The exception value will need to be resolved with @guillaumevernieres and others, as DA might need the missing value to be set as 0.
@jiandewang what module issues did you have on hera? I was curious on Friday if we had module mis-match issues as a possible issue.
@JessicaMeixner-NOAA I followed Walter's method (the g-w I used is the cycle one you asked me to run). No error pop out after I did source ush/......... but when I ran ocnicepost.x it crashed at writing 3D mask file.
a quick and dirty solution: apply this command in the script after DA ocean files being generated: ncatted -a missing_value,,m,f,-1E34 that will make oceanpost happy
Apologies for being late to the party. Am I understanding that the missing value is defined as 0.0 in the history file? A missing value of 0.0 makes no sense to me, since it is also a valid value. How do you distinguish where Temp=0 because it really is 0.0C and where it is 0 because it is a land point?
Apologies for being late to the party. Am I understanding that the missing value is defined as 0.0 in the history file? A missing value of 0.0 makes no sense to me, since it is also a valid value. How do you distinguish where Temp=0 because it really is 0.0C and where it is 0 because it is a land point?
@DeniseWorthen see https://github.com/NOAA-EMC/global-workflow/blob/develop/parm/config/gfs/config.ufs#L456C9-L456C31
@jiandewang Thanks, but that doesn't answer my question really. How is a missing value of 0.0 being distinguished from a physical value of 0.0?
@jiandewang Thanks, but that doesn't answer my question really. How is a missing value of 0.0 being distinguished from a physical value of 0.0?
@DeniseWorthen , you just don't construct your mask based on the fill value.
@guillaumevernieres Thanks. So where does your mask come from?
edit: I mean, which file? Are you retrieving it from the model output or are you using something else?
@guillaumevernieres Thanks. So where does your mask come from?
edit: I mean, which file? Are you retrieving it from the model output or are you using something else?
We use the mom6 grid generation functionality but this is overkill for this issue. The mask could simply be constructed using the layer thicknesses.
A PR has been created so that for GFS or GEFS versus GDAS/ENKF we have different exception values and number of layers for MOM6. This should be able to resolve this problem, although in the future, it might be good to still explore updating how the mask is defined in the ocean post.
What is wrong?
When running with the sea-ice PR that was just merged, so essentially develop as of today, it was noticed by @SulagnaRay-NOAA that all of the ocean grib2 files are constant values (mostly zeros). The native model output is not zeros and the ice gribs also appear to be okay.
Investigation as to what is going on and why is ongoing.
What should have happened?
We should have grib2 output files that match the native model output (and have non-zero/constant values).
What machines are impacted?
Hera
Steps to reproduce
This was discovered running a C384 test case of C384mx025_3DVarAOWCDA. However, I suspect other test cases would expose this issue as well.
Some example output can be found here: /scratch1/NCEPDEV/climate/Jessica.Meixner/cycling/iau_06/C384iaucold03/cold03/COMROOT/cold03/gfs.20210703/06/products/ocean/grib2/0p25
Log files can be found here: /scratch1/NCEPDEV/climate/Jessica.Meixner/cycling/iau_06/C384iaucold03/cold03/COMROOT/cold03/logs/2021070306
Additional information
@GwenChen-NOAA @jiandewang @SulagnaRay-NOAA @LydiaStefanova-NOAA @guillaumevernieres @CatherineThomas-NOAA FYI - any additional information or help is appreciated!
Do you have a proposed solution?
Not yet...