Closed JohnHalleyGotway closed 4 years ago
Binbin Zhou and Alicia Bentley from NOAA/EMC and Dave Turner from NOAA/GSL confirmed that the VSDB method is preferred. On 4/8, Liz Satterfield from NRL confirmed that the VSDB method is preferred.
Per Tara Jensen, reassigning this issue to the "MET 9.0.1 (bugfix)" milestone to be included in a bugfix release.
Committed fixes to the bugfix_1294_master_v9.0_ens_spread branch. Ran the regression test to compare the bugfix branch to master_v9.0 in dakota:/d3/projects/MET/MET_regression/bugfix_1294_master_v9.0_ens_spread
The results are attached here: test_regression_bugfix_1294_vs_master_v9.0.log
Here's a summary of the differences: (1) Stats for spread columns changed in ECNT: SPREAD, SPREAD_OERR, SPREAD_PLUS_OERR. (2) Differing numbers of SSVAR columns: test_output/met_test_scripts/ensemble_stat/ensemble_stat_20100101_120000V_ssvar.txt ERROR: differing number of rows 2453 vs. 2454 for row type SSVAR between versions 9_0 vs. 9_0
The SSVAR output should not really change. In fact, only 1 of the 7 files ending in "_ssvar.txt" had a difference. In master_v9.0, we compute the spread and set var = spread*spread. In this branch, we store track the variance directly with no calls to sqrt. I inspected the diffs and see that they're very small. I chalk this up to very small numeric differences due to the order of operations causing a point to be shifted from one variance bin to the next.
Additional testing for these changes. See the attached Rscript: convert_to_ORANK.R.txt
> Rscript convert_to_ORANK.R
mean_stdev_nowt(met) = 76.82625
sqrt_mean_var_nowt(vsdb) = 85.42907
This reads "fcst.data" from the current directory and writes an output file named "fcst_orank.txt" with data reformatted into MET ORANK lines. Next run Stat-Analysis from the existing master_v9.0 branch to aggregate ORANK->ECNT:
> MET-master_v9.0/met/bin/stat_analysis -lookin fcst_orank.txt -job aggregate_stat -line_type ORANK -out_line_type ECNT
SPREAD = 76.82625. And then rerun using the bugfix_1294_master_v9.0_ens_spread branch:
> MET-bugfix_1294_master_v9.0_ens_spread/met/bin/stat_analysis -lookin fcst_orank.txt -job aggregate_stat -line_type ORANK -out_line_type ECNT
SPREAD = 85.42907
This demonstrates that the bugfix branch produces the expected unweighted VSDB result. I only checked the unweighted result because ORANK does not include a column for the weight (which only applies to gridded data anyway).
Describe the Problem
This issue was raised by Binbin Zhou at NOAA/EMC. He noted that the ensemble spreads reported by VSDB and MET are significantly different.
To illustrate, see the attached sample data file (fcst.data), along with code for processing it. fcst.data.txt calc_ens_stats.R.txt vsdb_rmse_spread.f.txt
(1) Rscript for computing it both ways:
(2) FORTRAN code illustrating the VSDB method:
See the attachments illustrating that the VSDB methodology is preferred. STDDEV_calculations_GEFS.pdf
This task is to correct the logic for aggregating ensemble spread across multiple points. This will change the values in the ECNT line type for the SPREAD, SPREAD_OERR, and SPREAD_PLUS_OERR columns. The SPREAD column of the ORANK line type remains unchanged. However, we need to also test Stat-Analysis to ensure that the output of the following job type is correct: "-job aggregate_stat -line_type ORANK -out_line_type ECNT"
Additional open questions: (1) Any impacts to the SSVAR line type? (2) When aggregating multiple ECNT line together, should convert those spreads to variance, aggregate the variances, and then convert back to spread? (3) Are similar changes needed in METviewer and/or METcalcpy?
Time Estimate
2 days.
Sub-Issues
No sub-Issues required.
Relevant Deadlines
List relevant project deadlines here or state NONE.
Funding Source
2760811
Define the Metadata
Assignee
Labels
Projects and Milestone
Define Related Issue(s)
Consider the impact to the other METplus components.
Bugfix Checklist
See the METplus Workflow for details.
bugfix_<Issue Number>_master_<Version>_<Description>
bugfix_<Issue Number>_develop_<Description>