Add probabilistic metric functions (standalone)

SolarArbiter / solarforecastarbiter-core

Core data gathering, validation, processing, and reporting package for the Solar Forecast Arbiter

https://solarforecastarbiter-core.readthedocs.io

MIT License

33 stars 21 forks source link

Add probabilistic metric functions (standalone) #115

Closed dplarson closed 4 years ago

dplarson commented 5 years ago

Same as #114, but for probabilistic forecast metrics.

The current list of metrics are:

[ ] Brier score (BS)
[ ] Brier skill score (BSS)
[ ] reliability (REL)
[ ] resolution (RES)
[ ] uncertainty (UNC)
[ ] sharpness (SH)
[ ] ~prediction interval coverage probability (PICP)~
[ ] ~prediction interval normalized average width (PINAW)~
[ ] ~continuous ranked probability score (CRPS)~

EDIT: removing PICP and PINAW (temporarily) while we check the literature to understand their pros/cons and therefore whether we should add them

EDIT 2: removing CRPS from this issue as it will be addressed by issue #250.

wholmgren commented 4 years ago

@dplarson we've recently completed a few PRs for probabilistic forecast data model objects and API wrappers. So, it would be good to get probabilistic metrics going too. Could be just a couple to start. With a few more hacks we could have proof of concept for probabilistic forecast reports.

dplarson commented 4 years ago

@wholmgren Thanks for the heads up (I was actually talking with Adam today about adding these). I'll work on getting an initial PR ready by the end of the week.

wholmgren commented 4 years ago

Great thanks!

On Mon, Sep 23, 2019 at 5:12 PM David P. Larson notifications@github.com wrote:

@wholmgren https://github.com/wholmgren Thanks for the heads up (I was actually talking with Adam today about adding these). I'll work on getting an initial PR ready by the end of the week.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SolarArbiter/solarforecastarbiter-core/issues/115?email_source=notifications&email_token=ABBOER33XJXJAIH3OJFGKU3QLFLNVA5CNFSM4HSU5S52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7MUPGQ#issuecomment-534333338, or mute the thread https://github.com/notifications/unsubscribe-auth/ABBOERYLM2A5SNVOABRSOY3QLFLNVANCNFSM4HSU5S5Q .

dplarson commented 4 years ago

@wholmgren I may have missed this, but is there a simple example in SFA showing how the probabilistic forecasts are being stored/organized? The run_nwp() function in reference_forecasts/main.py mentions that probabilistic forecasts are returned as DataFrame objects, but it's not clear if that's how the metrics submodule would receive the forecasts.

dplarson commented 4 years ago

For ease of reference, a couple updates/notes based on today's team call:

the discretization of the probabilistic forecasts cannot be assumed to be the same for all forecasts (e.g. one forecast may give the (0.1, 0.2, ..., 0.9, 1.0) quantiles whereas another may give (0.25, 0.50, 0.75, 1.0))
similarly, we cannot assume that the discretization will be uniform (e.g. a forecast may be given for the (0.10, 0.13, 0.27, 0.30, 0.89, 0.99) quantiles)
regardless, the discretization will be provided/available for use by the metrics submodule
probabilistic forecast will be provided separately for each quantile, but for all timestamps (e.g. one pd.Series per quantile, indexed by the timestamp)
we cannot assume the probabilistic forecasts will be pre-sorted by quantile (e.g. you may get (0.25, 0.50, 0.75) or (0.50, 0.75, 0.25)), although that could change

Based on this, the initial PR will focus on computing the metrics given lists of pd.Series (one pd.Series per quantile, unsorted) and metadata about the discretization (e.g. a list of numbers corresponding to the quantiles).

wholmgren commented 4 years ago

We should discussing adding the ignorance score since Aidan and Dan are using it. It's straightforward for dichotomous events but otherwise requires CDF to PDF conversion. The main reason to not use the ignorance score is that the score is infinity when confidently (prob=100%) forecasting the wrong binary outcome, so there's a lot of room for confusion. And of course if that happens for one time interval, the average for many time intervals remains infinity.

wholmgren commented 4 years ago

When/how did PICP and PINAW come into the accepted metrics? They were not in the ~April survey and they're not on the metrics page. I support adding interval metrics, but I've not seen a good discussion of the properties of PICP and PINAW. I have seen them used in the solar/wind literature, but never really understood what to make of them. So a good reference would be very helpful. As for other interval metrics, Wilks "8.5.2. Central Credible Interval Forecasts" suggests RPS for fixed-width intervals and the Winkler score for fixed-probability intervals.

dplarson commented 4 years ago

You're correct that the PICP and PINAW metrics are not in the April survey; I must have added them based on a previous version of the metrics proposal document that I had. For now, I'll leave them off the list for PR #202 while I look into references and whether they offer some benefit over the other metrics we've already added to the official list on the website.

And thanks for the suggestion on RPS and Winkler score.