Correct the ensemble spread aggregation logic.

Describe the Enhancement

Please see NCAR/MET#1294 for the bugfix in MET's Ensemble-Stat tool that was included in met-9.0.1.

The algorithm for aggregating ensemble spread across multiple cases has been updated in MET to match the logic used in VSDB. Prior to the MET 9.0.1 bugfix release, MET aggregated ensemble spread as a weighted mean of the input spread values themselves. I suspect METviewer employs this same logic when aggregating the spread values in the ECNT columns named SPREAD, SPREAD_OERR, and SPREAD_PLUS_OERR. It likely computes the weighted mean of these values where the weight is determined by the TOTAL column.

Rather than computing the weighted mean directly, METviewer should be modified to first compute the variance = spread*spread. Next, compute the weighted mean of the variance values. And finally, compute the aggregated spread = square root(aggregated variance).

I am creating this issue in METviewer, but please move it to METcalcpy instead if that's where it really belongs.

Time Estimate

Estimate the amount of work required here. Issues should represent approximately 1 to 3 days of work.

Sub-Issues

Consider breaking the enhancement down into sub-issues.

[ ] Add a checkbox for each sub-issue here.

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

Define the source of funding and account keys here or state NONE.

Define the Metadata

Assignee

[X] Select engineer(s) or no engineer required: Tatiana
[X] Select scientist(s) or no scientist required: John Opatz

Labels

[X] Select component(s)
[X] Select priority
[X] Select requestor(s)

Projects and Milestone

[ ] Review projects and select relevant Repository and Organization ones
[X] Select milestone

Define Related Issue(s)

Consider the impact to the other METplus components.

[ ] METplus, MET, METdb, METviewer, METexpress, METcalcpy, METplotpy

Enhancement Checklist

See the METplus Workflow for details.

[ ] Complete the issue definition above.
[ ] Fork this repository or create a branch of develop. Branch name: feature_<Issue Number>_<Description>
[ ] Complete the development and test your changes.
[ ] Add/update unit tests.
[ ] Add/update documentation.
[ ] Push local changes to GitHub.
[ ] Submit a pull request to merge into develop, listing the \<Issue Number> in the title.
[ ] Iterate until the reviewer(s) accept and merge your changes.
[ ] Delete your fork or branch.
[ ] Close this issue.

The logic for aggregating the ECNT spread statistics can be found in the function agg_ecnt_lines() starting at line 2574 of: https://github.com/NCAR/MET/blob/master_v9.0/met/src/tools/core/stat_analysis/aggr_stat_line.cc

In general, rather than computing the weighted mean of statistics which include a square root (SPREAD and RMSE), we compute the weighted mean of their squares. Then we compute the square root of the aggregated squared term.0

So rather than storing the SPREAD, SPREAD_OERR, and SPREAD_PLUS_OERR, we're squaring those terms and storing the variance values:

m[key].ens_pd.var_na.add(square(cur.spread));
m[key].ens_pd.var_oerr_na.add(square(cur.spread_oerr));
m[key].ens_pd.var_plus_oerr_na.add(square(cur.spread_plus_oerr));

Similarly, instead of RMSE and RMSE_OERR, we store their squares:

m[key].mse_na.add((is_bad_data(cur.rmse) ?
                            bad_data_double :
                            cur.rmse * cur.rmse));
m[key].mse_oerr_na.add((is_bad_data(cur.rmse_oerr) ?
                            bad_data_double :
                            cur.rmse_oerr * cur.rmse_oerr));

Starting on line 2624, we compute the weighted mean of the variances and squared errors, before computing the final aggregated value by taking the square root.

dtcenter / METviewer