dtcenter / MET

Model Evaluation Tools
https://dtcenter.org/community-code/model-evaluation-tools-met
Apache License 2.0
77 stars 24 forks source link

SEEPS Quality Assurance (QA) code review #2882

Open mpm-meto opened 4 months ago

mpm-meto commented 4 months ago

Following the addition of the SEEPS code to PointStat and GridStat a science quality assurance (QA) review of the code needs to be carried out.

Met Office will conduct this code review. Changes implemented by DTC.

Assignee

Checklist

See the METplus Workflow for details.

mpm-meto commented 4 months ago

QA review 99% complete, and almost ready to hand over code changes. Pull request will be a combination of bug fixes and code tidy/clean up to make it less ambiguous. One of the key issues has been that this code has been translated between so many different languages, some of which are row-dominant/major and others are column-dominant/major. In these situations using matrix referencing becomes highly ambiguous and some issues with data being used in a (wrongly) transposed form were found.

JohnHalleyGotway commented 4 months ago

Recommend serving up the gridded SEEPS climatology data via this DTCenter Zenodo community. @mpm-meto and @RachelNorth, please sign into Zenodo and send me you usernames so that I can add you as curators to the DTCenter community.

Once this data is posted to Zenodo, we'll need to update the MET User's Guide with instructions for finding/using it.

jprestop commented 4 months ago

Will also want to test with data once accessible; @hsoh-u worked on SEEPS previously and should be able to help.

mpm-meto commented 2 months ago

We don't have permissions to create a branch. Please can this be fixed or someone create a branch for us?

mpm-meto commented 2 months ago

Rachel and I have found a way to edit using a fork. This creates a branch but it does create two branches so we're not editing the same branch. The branches will need to be merged. We don't want to create extra work but we don't know enough about git (yet) to know how to do that.

JohnHalleyGotway commented 2 months ago

@mpm-meto and @RachelNorth, please find the results of our work today in this feature branch on the MET repo: https://github.com/dtcenter/MET/tree/feature_2882_seeps_qa

As a reminder, it currently fails to compile in pair_data_point.cc, as shown below:

pair_data_point.cc: In member function 'void VxPairDataPoint::add_point_obs(float*, const char*, const char*, unixtime, const char*, float*, Grid&, const char*, const DataPlane*)':
pair_data_point.cc:1569:49: error: cannot convert 'double' to 'SeepsScore*' in assignment
 1569 |                seeps = pd[i][j][k].compute_seeps(hdr_sid_str, fcst_v, obs_v, hdr_ut);
      |                        ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                                 |
      |                                                 double
JohnHalleyGotway commented 2 months ago

@RachelNorth and @mpm-meto, this is a reminder about 2 topics. Based on our recent work together, it looks like we'll be modifying the output column names in the SEEPS line type, replacing S12, S13, and so on with more descriptive names.

If the PR for this work, does change those column names...

  1. Remember to update them in this table of the documentation. That can be found in the docs/Users_Guide/point_stat.rst.
  2. Remember to create a new METdataio enhancement issue to update the METdataio schema to handle the new column names... and rename columns in existing databases.

@bikegeek, I'm tagging you here to put this on your radar.

JohnHalleyGotway commented 1 month ago

@mpm-meto and @RachelNorth, FYI, I got this feature branch compiling again with the commit listed right above this comment. I did merge the latest changes from the develop into this branch and manually resolved a few conflicts. I'm 85% certain it's doing what you want it to do, but please do review and test further!

That commit includes a lot of standardizing of debug log messages as well. Although, once you're satisfied with the functionality, I'd recommend removing some of them... since there are so many. It's good to have useful debugging, but each one slows down the code slightly. Even if it's not printed by the logger, it still takes time to construct the message, pass it to the logger, and have the logger decide whether or not to print it.

I did also manually trigger this testing.yml GitHub action run. It compiled the code and ran all the unit tests, one of which failed with:

WARNING: 
WARNING: 
WARNING: get_nc_var(var_name) --> The variable "odfl_00" does not exist!
WARNING: 
WARNING: 
WARNING: get_nc_var(var_name) --> The variable "odfh_00" does not exist!
WARNING: 
WARNING: 
WARNING: get_nc_var(var_name) --> The variable "olfd_00" does not exist!
WARNING: 
WARNING: 
WARNING: get_nc_var(var_name) --> The variable "olfh_00" does not exist!
WARNING: 
WARNING: 
WARNING: get_nc_var(var_name) --> The variable "ohfd_00" does not exist!
WARNING: 
WARNING: 
WARNING: get_nc_var(var_name) --> The variable "ohfl_00" does not exist!
WARNING: 
ERROR  : 
ERROR  : SeepsClimoGrid::read_seeps_climo_grid() -> Did not get odfl_00 (s12)
ERROR  : 

So clearly, I we need to update the variable names in ${MET_TEST_INPUT}/climatology_data/seeps/PPT24_seepsweights_grid.nc. I'll go work on that now.

@robdarvell hopefully the following commands will help in getting this branch compiled there:

# I like including the branch name in the directory (as below), but its not required
git clone https://github.com/dtcenter/MET MET-feature_2882_seeps_qa
cd MET-feature_2882_seeps_qa
git checkout feature_2882_seeps_qa
# Confirm that your compilation environment is set
printenv | grep MET_
# Run configure, adding whatever options you normally use. Below, I'm compiling all the options (--enable-all) in the current directory (--prefix)
./configure --prefix=`pwd` --enable-all
make install test >& make.log &
tail -f make.log
# Type CTRL-C to exit the tail

Then @mpm-meto and @RachelNorth will make edits as needed in the source code.

JohnHalleyGotway commented 1 month ago

@mpm-meto and @RachelNorth, FYI, I ran the following ncrename command to rename the SEEPS climo variables to use the new names:

# Create a new version with v12.0 in the filename.
# Once this feature branch is merged into develop, we can replace the existing PPT24_seepsweights_grid.nc with the v12.0 version.
ncrename -h -v s12_00,odfl_00 -v s13_00,odfh_00 \
   -v s21_00,olfd_00 -v s23_00,olfh_00 \
   -v s31_00,ohfd_00 -v s32_00,ohfl_00 \
   PPT24_seepsweights_grid.nc PPT24_seepsweights_grid_v12.0.nc

The result can be found in: https://dtcenter.ucar.edu/dfiles/code/METplus/MET/MET_unit_test/unit_test/climatology_data/seeps/PPT24_seepsweights_grid_v12.0.nc

I also re-triggered another regression test run for the MET-feature_2882_seeps_qa branch. That compiled the MET code, saw that there's a new timestamp on the input data tarfile, automatically pulled the updated inputs, ran all the unit tests, and then diffed against our previous output.

The good news is that all the tests ran. However some differences were flagged, as can be seen by downloading this diff artifact tar file.

You'll find files in there with the TRUTH and OUTPUT suffixes. Spot checking some, I see that some SEEPS values that were non-zero are now very close to zero. Based on the diffs I'm seeing, I'm guessing more work is needed.

For example, comparing grid_stat/grid_stat_SEEPS_000000L_20211202_000000V_seeps_TRUTH.txt and _OUTPUT.txt, I see:

musial6:diff johnhg$ cat grid_stat/grid_stat_SEEPS_000000L_20211202_000000V_seeps_TRUTH.txt | cut -c272-600
TOTAL  S12        S13      S21         S23      S31      S32       PF1     PF2     PF3     PV1     PV2     PV3      MEAN_FCST MEAN_OBS SEEPS
575576 6048.16295 24.63754 21215.92302 22.81579 2369.768 467.28489 0.61274 0.27837 0.10888 0.75553 0.21359 0.030882   3.55237  0.74543 0.59112
musial6:diff johnhg$ cat grid_stat/grid_stat_SEEPS_000000L_20211202_000000V_seeps_OUTPUT.txt | cut -c272-600
TOTAL  S12         S13        S21         S23         S31         S32         PF1     PF2     PF3     PV1     PV2     PV3      MEAN_FCST MEAN_OBS SEEPS
575576 2.3715e-322 6.926e-310 4.6542e-310 3.4804e-317 4.6542e-310 4.6542e-310 0.61274 0.27837 0.10888 0.75553 0.21359 0.030882   3.55237  0.74543 0.62105

Those numbers differ a lot.

And I also assume that we need to modify the actual output columns names of S12, S13, and so on?

Please take a look and let me know what the next steps are.

Would you like me to change the output columns names on this feature branch?

JohnHalleyGotway commented 1 week ago

When meeting on 9/20/24, @mpm-meto confirmed that @JohnHalleyGotway should proceed with the changes to the output column names described above and submit a PR to include these changes in the MET-12.0.0-beta6 development release.

@mpm-meto will coordinate with @RachelNorth to confirm that no additional changes are needed at this time. However, they will plan to test this functionality in MET-12.0.0-beta6 for accuracy and will submit a bugfix issue if they encounter any unexpected behvior.