dtcenter / MET

Model Evaluation Tools
https://dtcenter.org/community-code/model-evaluation-tools-met
Apache License 2.0
78 stars 24 forks source link

Add support to the MET tools for climatological distribution percentile thresholds. #1138

Closed JohnHalleyGotway closed 4 years ago

JohnHalleyGotway commented 5 years ago

This is an extension of the percentile thresholds described here: https://github.com/NCAR/MET/issues/76

As of met-8.1, the tools support defining percentile thresholds as follows:

SFP50, >SOP50, >SCP50, >USP50(2.5), ==FBIAS1 ... for the ... sample forecast, sample observation, sample climatology percentiles, user-specified percentile, and frequency bias 1 options, respectively. These threshold types derive a single actual threshold value which is the numeric percentile value over the requested spatial region.

This task is to define climatological percentiles which vary for each grid point. Add support for this new threshold type: >CDP50, for climatological distribution percentile.

This can only be computed when the user defines the "climo_mean" and "climo_stdev" config file options to define the full climo distribution. As such, we may need to add "climo_stdev" to additional MET tools.

In addition, NOAA/EMC would like the option of defining these climo distributions from 101 empirical 2.5 degree values, rather than using the climo mean and stdev.

TaraJensen commented 5 years ago

Charge 2780541

TaraJensen commented 5 years ago

Email from Binbin on 8/21/19 regarding sample data

I created a sample file directory on theia for our GEFS verification package located in directory:

/scratch4/NCEPDEV/fv3-cam/save/Binbin.Zhou/work/grid2grid/verf_g2g.v3.0.13/sample_data

In this directory, 20 grib2 GEFS member files are those headed as "fcst", containing only one 24hr forecast (run at 12Z 20190818).

The analysis data file is obsv.grib.gefs2p5.ensbl which is GDAS analysis validated at 12Z 20190819.

The climatology data are those files ended with JUL, AUG and SEP. They are binary, 101 bin data from which 11 bin data can be derived (averaged) For detailed description please look at readme.txt file in the same directory.

The output VSDB file is GEFS2P5_2019081912.vsdb. The RPS and RPSS can be seen in "RPS" record line. The first 6 scores in each RPS record line are RPSf, RPSc, RPSS, CRPSf, CRPSc, CRPSS. "f" refers to forecast, "c" to climatology. So if you like to see the values of RPS and RPSS, just look at the first 3 values on RPS line records.

Our verification is over 6 sub-regions, identified with G2/XX, G2 means grid#2 (2.5x2.5degree), XX is NH, SH, TR, NA, AS, EU, etc (refers to N. and S. hemisphere, Tropical, N. America, Asia and Europe) . The la and lon for these regions are defined in control
file g2g.ctl.ensemble.

If I miss something or you have further questions, please let me know.

Binbin

JohnHalleyGotway commented 5 years ago

fcst_thresh = [ <=CDP10, >CDP10&&<=CDP20, >CDP20&&<=CDP30, ..., >CDP90 ]; Need to support both binned climatologies and mean/spread.

(1) Logic for 101 bins. For each month, assume the data is valid for the 1st of the month. For August 10th, take the distance-weighted mean of Aug and Sept. See: Line 320 of /d3/projects/MET/NCEP_unification/verf_g2g.v3.0.11/sorc/verf_g2g_grid2grid_grib2.fd/EFS.f Extract the 10 binned values from Aug/Sept and interpolate. No info about the hours. (2) Logic for monthly 2.5 degree mean/stdev Not actually used for global vx. (3) Logic for daily 1.0 degree mean/stdev Is used for global vx to verify at 00, 06, 12, and 18... and actually only verifies at 00 and 12. (4) Use the output of MET's new S2S tool.

Compute scores in each bin and then report the mean of the 11 bins. Enable MET to only write the mean of the 11 bins to keep the file smaller. Currently, VX_MASK = {NAME}_BIN# (e.g. NAME = FULL}. Would like to write {NAME}_BINMEAN to indicate that it's a mean of the 11 bins.

Find vsdb output on WCOSS run by NCO using 1.0 degree mean/stdev climatology in: /com/verf/prod/vsdb/grid2grid/ens

Example job card for running ensemble vx: /nwprod/verf_g2g.v3.0.15/ecf/jverf_grid2grid_ens_00.ecf

They are implementing the 2.5 degree climatology operationally which is running Yuejin's code every day by Yan. And NCO is currently running the 1.0 degree climatology. The next implementation of GEFS will be 31 members, next May.

JohnHalleyGotway commented 5 years ago

Committed development code to the feature_1138_climo_bins branch after finishing updates to the library and application code and merging the latest changes from develop back into the branch.

Also made sure all existing unit tests run and produce the same output as the develop branch.

Next tasks are: (1) Enhance Grid-Stat to write climo_dist_perc fields to the NetCDF matched pairs output file. (2) Add new or extend existing unit tests to actually exercise the new functionality.

JohnHalleyGotway commented 5 years ago

Finished enhancing Grid-Stat to support new nc_pairs_flag.climo_cdp option. Updated README and MET User's Guide with info about nc_pairs_flag.climo_cdp as well as the CDP threshold types. With all these changes, will run the regression test again to make sure it still works.

JohnHalleyGotway commented 5 years ago

Demonstrated the CDP functionality to Tara on 10/15/2019 and showed her the output from the updated unit tests.

Here's some more tasks to do:

I also discussed with Jamie the option of applying this logic to cat_thresh as well. But she ultimately advised against it. There are too many open questions of exactly how to do it, and we agree that it'd likely cause more confusion among users than convenience:

JohnHalleyGotway commented 5 years ago

Finished adding the shortcut for ==CDP10 for cnt_thresh, wind_thresh, and obs_thresh. Fixed ensemble-stat and added test for using CDP thresholds there. They are supported in obs_thresh, but not cat_thresh since we don't have any climo info defined for the ensemble dictionary entries.

JohnHalleyGotway commented 4 years ago

MET is not handling the time interpolation of monthly NCEP 2.5 degree data correctly. The data for each month is valid on the 15th of that month. For the forecast valid time, we need to find the monthly data before/after that date and interpolate to that date.

Recommend replacing config options: match_month = TRUE; match_day = FALSE; time_step = 21600;

With: day_interval = 1; // or 31 for monthly or NA for persistence (day_interval = 1 is a special case) hour_interval = 6; // or 1, 6, 12, 24 or NA for persistence (ERROR: < 0 or > 24)

The day_interval and hour_interval options define the spacing of the climo data and control what data is used.

JohnHalleyGotway commented 4 years ago

Finished by merging feature_1138_climo_bins branch into develop. Support added for climo CDP threshold types: >CDP75. And support added for computing the mean of statistics across climo bins.