NOAA-EMC / WW3

WAVEWATCH III
Other
258 stars 517 forks source link

Bit reproducibility of wind in output files when reading from restart (WRST switch) #181

Open ajhenrique opened 4 years ago

ajhenrique commented 4 years ago

The switch WRST was added to provide an option to save wind data in WW3 restart files for several applications. Tests have shown that when WND output is chosen for saving input wind data interpolated to wave grids in out_grd, latter wind data in out_grd files from restarted runs at the initial time are not bit identical to wind fields save in out_grd from the preceding run, at the corresponding time step. Most other parameters are bit identical. Wind data in out_grd files from subsequent output times are also b4b when compared to overlapping outputs from the originating run.

JessicaMeixner-NOAA commented 4 years ago

Here are some details/updates I found while looking into this issue:

Based on my tests, I do not believe this is a WRST switch issue (or alone is a WRST switch issue). There certainly should either be an update as to whether TW0 or TWN winds are written in the restart file when using WRST and an update should be made in the manual that it's assumed that the restart time will correspond to one of those two times. However, it is my opinion that there are some more basic issues for wind and Charnock in the out_grd binary files at t=0 from a restart run. While it is of course ideal for these to be bit for bit, one solution is disregarding or not outputting output at the initial time due to the fact that one would have this information from the original run if you were restarting.

ajhenrique commented 4 years ago

@JessicaMeixner-NOAA thanks for the updates on the wind from WRST b4b issue. Ideally, we would want to have all these discrepancies sorted out as this would ensure the code is correct. Note that all results are fully b4b when winds are read from files, not from the restart file, so that the issue must be somehow related to the addition of the WRST switch.

I looked at ways to change the scripts in order to avoid writing output at the initial time step and mask out the problem with WRST wind bit reproducibility in the coupled run as you suggest. This, however, will complicate the scripting in a way that I would prefer to avoid at this point. Solving the problem at its root would help us not only correct the code, but keep scripting simpler (which will benefit operations at NCEP), and also avoid product changes and affecting downstream dependencies (eg, V&V, AWIPS, NAWIPS, etc) that would require adjustments.

JessicaMeixner-NOAA commented 4 years ago

The test case I was using to debug this can be replicated as follows:

git clone https://github.com/jessicameixner-noaa/ufs-weather-model cd ufs-weather-model git checkout feature/unit_test git submodule update --init --recursive cd tests ./utest -n fv3_gfdlmprad -c std -k #this just needs to be run once ./utest -n fv3_gfdlmprad -r restart -k

This test can be run on Hera and is likely not completely portable (because of my changes, not the unit tests themselves). The 'baseline area' will be generated at /scratch1/NCEPDEV/stmp4/$USER/FV3_UT/UNIT_TEST

and the run directories (which are saved by the -k option above) will be located at: /scratch1/NCEPDEV/stmp2/$USER/FV3_UT

After running ./utest -n fv3_gfdlmprad -r restart -k you will get a directory (such as: /scratch1/NCEPDEV/stmp2/Jessica.Meixner/FV3_UT/ut_6755 ) which will have the directories: fv3_gfdlmprad_restart fv3_gfdlmprad_std From there you can do diffs for the various files.

There are a few extra output files generated from WW3 that are generated for ease of debugging, YYYYMMDD.HHMMSS.out_txt.glo_30m which has text output from w3iogo and debugging, YYYYMMDD.HHMMSS.rstTXT.read.glo_30m and YYYYMMDD.HHMMSS.rstTXT.write.glo_30m which output the wind in x and y space what is written to the file and what is read in when using WRST with restarting.

JessicaMeixner-NOAA commented 4 years ago

@aliabdolali if you do not want to use the WW3 that ufs-weather-model points with my debugging updates/tries you will want to point to the production/GFS.v16 branch of WW3

aliabdolali commented 4 years ago

Hi, @JessicaMeixner-NOAA @ajhenrique I found the fix for non-identical wind fields in restart files. I created a test for the global grid (WW3 only) and tested my fix. I did not push the fix to the feature branch yet, could you test it and let me know if it works? This fix applies to Current too. See attached. WindB4B.pdf

JessicaMeixner-NOAA commented 4 years ago

@aliabdolali Great catch! This is awesome.

I can test with the set-up I described above but it does not have currents, so maybe we should just wait for @ajhenrique to test with the full systems to know for sure.

ajhenrique commented 4 years ago

@aliabdolali I've tested the proposed fix running a canned case representing coupled system with IAU for 3h+48h, WW3 with a 3-grid mosaic (Arctic PS 9km NH 1/6 deg, SH 14/ deg), generating restarts at 3h+24h and running a restated leg from 24h-48h, then comparing the wave binary outputs from the overlapping period. Results with the esmf8.0.1 branch and all the most recent changes to WW3 code for speeding up initialization and internal interpolation etc. Ir an the canned case on WCOSS P3 (Dell) and Hera, in both cases the initial output step had outputs that were not b4b (wind fields had small discrepancies between runs), but all co-located outputs were b4b thereafter.

ajhenrique commented 4 years ago

Here are figures indicating where the issues occur. In my test case, out of 6=876,960 grid points there were 176 that were not b4b, typically wind speeds were relatively low, mostly <1m/s, but some to >3 m/s. First figure shows the gridded wind speeds from the first run, superposed with markers (red and yellow) where they are not reproduced in the second run. Also, there are two figures showing a series of wind speeds from the 176 not b4b points, and their ratio. Figures are all from one of the three grids in the GFSv16-wave grid mosaic. fig_diff_u_restart fig_wind_speeds fig_wind_ratio

MatthewMasarik-NOAA commented 4 months ago

May be related to #1134.