Refine configuration options for defining bins in the verification of probabilistic forecasts

JohnHalleyGotway commented 2 years ago

Describe the Enhancement

This issue arose via GitHub discussion dtcenter/METplus#1742. While a probability forecast can have any value between 0 and 1, it is very often the case that probability forecasts are not actually continuous. Instead, the values are often clustered around 1/n where n is an integer number of ensemble members. This issue is to refine the configuration options in support of verifying these non-continuous, ensemble-derived probability forecasts.

Verification of probabilistic forecasts is performed in MET's Point-Stat, Grid-Stat, Stat-Analysis, and Series-Analysis tools. In general, the probability logic is enabled by the presence of the fcst.prob dictionary in the tool's configuration file. It is either defined as a boolean (true/false) or provides additional information for how to select the probability data from the input GRIB1/2 file.

Prior to computing probabilistic statistics, MET first places all the probability matched pairs into an Nx2 contingency table. The N is based on the number of probability bins between 0 and 1. And the 2 indicates whether or not the event actually occurred in the observation. The cat_thresh configuration option specified for the probabilistic input data defines the bins that should be applied in the verification step. cat_thresh can be defined using two conventions:

Explicitly specify thresholds spanning [0, 1], all with the same threshold type:
```
cat_thresh = [ >=0, >0.25, >=0.5, >=0.75, >=1.0 ];
```
Use threshold equality logic as a shorthand (the example below is equivalent to the one above):
```
cat_thresh = [ ==0.25 ];
```
Both options define the same 4 probability bins. When computing the Brier Score, MET uses the center point of these bins as the probability value for ALL points falling in that bin, as seen on this line of code. For example, any probability value between 0 and 0.249999 would be evaluated as 0.125 (the center point of that bin) in the computation of Brier Score.

The task here is to refine these options to more easily evaluate probability forecasts whose values are actually 1/n, where n is a number of ensemble members. For example, when n = 6, we'd want the values used in the Brier Score computation to be 0, 0.166, 0.333, 0.5, 0.666, 0.8333, and 1.0, corresponding to ensemble values of 0/6, 1/6, 2/6, 3/6, 4/6, 5/6, and 6/6. Defining probability bins whose center values exactly match these is confusing and tedious. The task here is to add a configuration option to make it easy and convenient.

One option for supporting this logic is by adding a 3rd variation for defining probability bins. In addition to the 2 listed above, add a 3rd option for cat_thresh = [ ==N ]; where N is some integer > 1. As of MET version 10.1.0, this setting results in a runtime error:

ERROR  : string_to_prob_thresh() -> threshold value (6) must be between 0 and 1.

We could repurpose this by letting N>1 define the number of ensemble members for which we want probability bins of size 1/n and centered on those 1/n values. The advantage to this approach is that we would not need to add/document new config options. Probability thresholds are actually defined in multiple config file contexts (cat_thresh, cov_thresh, prob_cat_thresh, prob_genesis_thresh). Supporting this in all those contexts is nice. The downside is that perhaps a new config option name with additional documentation would make the logic more clear.

Time Estimate

Estimate the amount of work required here. Issues should represent approximately 1 to 3 days of work.

Sub-Issues

Consider breaking the enhancement down into sub-issues.

[ ] Add a checkbox for each sub-issue here.

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

Define the source of funding and account keys here or state NONE.

Define the Metadata

Assignee

[ ] Select engineer(s) or no engineer required
[ ] Select scientist(s) or no scientist required

Labels

[x] Select component(s)
[x] Select priority
[x] Select requestor(s)

Projects and Milestone

[x] Select Repository and/or Organization level Project(s) or add alert: NEED PROJECT ASSIGNMENT label
[x] Select Milestone as the next official version or Future Versions

Define Related Issue(s)

Consider the impact to the other METplus components.

[x] METplus, MET, METdataio, METviewer, METexpress, METcalcpy, METplotpy
[ ] May need a METplus issue depending on implementation details.

Enhancement Checklist

See the METplus Workflow for details.

[ ] Complete the issue definition above, including the Time Estimate and Funding Source.
[ ] Fork this repository or create a branch of develop. Branch name: feature_<Issue Number>_<Description>
[ ] Complete the development and test your changes.
[ ] Add/update log messages for easier debugging.
[ ] Add/update unit tests.
[ ] Add/update documentation.
[ ] Push local changes to GitHub.
[ ] Submit a pull request to merge into develop. Pull request: feature <Issue Number> <Description>
[ ] Define the pull request metadata, as permissions allow. Select: Reviewer(s) and Linked issues Select: Repository level development cycle Project for the next official release Select: Milestone as the next official version
[ ] Iterate until the reviewer(s) accept and merge your changes.
[ ] Delete your fork or branch.
[ ] Close this issue.

RogerHar commented 1 year ago

Hi @JohnHalleyGotway , I'm very glad to hear that you're planning to include this in MET 12. I'm happy with the config syntax you suggest i.e. cat_thresh = [ ==N ] as long as it's documented clearly, I guess where the [ ==0.25 ] syntax is explained in 11.2.4.3. It would be nice if both these special uses of == were also explained or cross-referenced at the section at the start of 5. Configuration File Overview where the rest of the threshold syntax is documented, as I think that's where I'd look first rather than under 11.2.4. Statistical measures.

jprestop commented 9 months ago

@JohnHalleyGotway In Discussion 2442, @RogerHar noted that this issue still has the label alert: NEED MORE DEFINITION. He said to let him know if it needs any more explanation from me of what's wanted.

JohnHalleyGotway commented 9 months ago

Thanks @jprestop and @RogerHar for the heads up. Yes, I just removed that NEED MORE DEFINITION label. And I'll note that I met with the DTC RRFS group this morning. They're doing similar evaluations of ensembles, and would like to see this functionality. @michelleharrold is planning to add a comment here to clarify the details.

JohnHalleyGotway commented 8 months ago

I'm actively working on this issue and wanted to document the current state of development on my feature branch. The key change is allowing the first bin to start < 0 and the last bin to end > 1. That allows us to have bins whose center point is the expected value of 1/n for n = 0 ... the number of ensemble members.

Testing results.

Expected successes:

Set "==n" for n < 1:
- cat_thresh = [ ==0.25 ];
- ==0.25 is written to the output corresponding to bins 0, 0.25, 0.5, 0.75, 1.0.
Set "==n" for n > 1:
- cat_thresh = [ == 4 ];
- ==4 is written to the output corresponding to bins -0.125, 0.125, 0.375, 0.625, 0.875, 1.125.
- Note that the center point of these bins are 0/4, 1/4, 2/4, 3/4, and 4/4.
If equal bin widths for [0, 1] are specified, they are detected and the shorthand notation is written:
- cat_thresh = [ >=0, >=0.25, >=0.5, >=0.75, >=1 ];
- ==0.25000 is written to the output.'
If equal bin widths for the number of ensemble members are specified, they are detected and the shorthand notation is written:
- cat_thresh = [ >=-0.125, >=0.125, >=0.375, >=0.625, >=0.875, >=1.125 ];
- ==4 is written to the output.

Expected errors:

Can't be negative:

cat_thresh = [ ==-4 ];

ERROR  : string_to_prob_thresh() -> the threshold string (==-4) must specify a probability bin width between 0 and 1 or an integer number of ensemble members.

If > 1, must be an integer:

cat_thresh = [ ==2.995 ];

ERROR  : string_to_prob_thresh() -> the threshold string (==2.995) must specify a probability bin width between 0 and 1 or an integer number of ensemble members.

Must cover the range [0, 1]:

cat_thresh = [ >=0, >=0.25, >=0.75 ];

ERROR  : check_prob_thresh() -> when verifying a probability field, you must select at least 3 thresholds which include the range [0, 1] (current setting: >=0,>=0.25,>=0.75).
ERROR  : Consider using the "==n" shorthand notation to specify probability bins of equal width, for n < 1, or the integer number of ensemble members, for n > 1.

Must all have the same inequality type:

cat_thresh = [ >=0, >=0.25, <0.75, >=1 ];

ERROR  : check_prob_thresh() -> when verifying a probability field, all thresholds must be greater than or equal to, using "ge" or ">=" (current setting: >=0,>=0.25,<0.75,>=1).
ERROR  : Consider using the "==n" shorthand notation to specify probability bins of equal width, for n < 1, or the integer number of ensemble members, for n > 1.

All bins must intersect with [0, 1]:

cat_thresh = [ >=0, >=0.25, >=0.75, >=1.1, >=1.25 ];

ERROR  : check_prob_thresh() -> when verifying a probability field, each probability bin must overlap the range [0, 1] (current setting: >=0,>=0.25,>=0.75,>=1.1,>=1.25).
ERROR  : Consider using the "==n" shorthand notation to specify probability bins of equal width, for n < 1, or the integer number of ensemble members, for n > 1.

dtcenter / MET