dtcenter / METviewer

Tool that creates plots using MET verification statistics output and the R statistical package
http://www.dtcenter.org/met/metviewer/
Apache License 2.0
14 stars 1 forks source link

Correct the ensemble spread aggregation logic. #184

Closed JohnHalleyGotway closed 4 years ago

JohnHalleyGotway commented 4 years ago

Describe the Enhancement

Please see NCAR/MET#1294 for the bugfix in MET's Ensemble-Stat tool that was included in met-9.0.1.

The algorithm for aggregating ensemble spread across multiple cases has been updated in MET to match the logic used in VSDB. Prior to the MET 9.0.1 bugfix release, MET aggregated ensemble spread as a weighted mean of the input spread values themselves. I suspect METviewer employs this same logic when aggregating the spread values in the ECNT columns named SPREAD, SPREAD_OERR, and SPREAD_PLUS_OERR. It likely computes the weighted mean of these values where the weight is determined by the TOTAL column.

Rather than computing the weighted mean directly, METviewer should be modified to first compute the variance = spread*spread. Next, compute the weighted mean of the variance values. And finally, compute the aggregated spread = square root(aggregated variance).

I am creating this issue in METviewer, but please move it to METcalcpy instead if that's where it really belongs.

Time Estimate

Estimate the amount of work required here. Issues should represent approximately 1 to 3 days of work.

Sub-Issues

Consider breaking the enhancement down into sub-issues.

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

Define the source of funding and account keys here or state NONE.

Define the Metadata

Assignee

Labels

Projects and Milestone

Define Related Issue(s)

Consider the impact to the other METplus components.

Enhancement Checklist

See the METplus Workflow for details.

JohnHalleyGotway commented 4 years ago

The logic for aggregating the ECNT spread statistics can be found in the function agg_ecnt_lines() starting at line 2574 of: https://github.com/NCAR/MET/blob/master_v9.0/met/src/tools/core/stat_analysis/aggr_stat_line.cc

In general, rather than computing the weighted mean of statistics which include a square root (SPREAD and RMSE), we compute the weighted mean of their squares. Then we compute the square root of the aggregated squared term.0

So rather than storing the SPREAD, SPREAD_OERR, and SPREAD_PLUS_OERR, we're squaring those terms and storing the variance values:

m[key].ens_pd.var_na.add(square(cur.spread));
m[key].ens_pd.var_oerr_na.add(square(cur.spread_oerr));
m[key].ens_pd.var_plus_oerr_na.add(square(cur.spread_plus_oerr));

Similarly, instead of RMSE and RMSE_OERR, we store their squares:

m[key].mse_na.add((is_bad_data(cur.rmse) ?
                            bad_data_double :
                            cur.rmse * cur.rmse));
m[key].mse_oerr_na.add((is_bad_data(cur.rmse_oerr) ?
                            bad_data_double :
                            cur.rmse_oerr * cur.rmse_oerr));

Starting on line 2624, we compute the weighted mean of the variances and squared errors, before computing the final aggregated value by taking the square root.