frazane / scoringrules

Scoring rules for probabilistic forecast evaluation.
https://frazane.github.io/scoringrules/
Apache License 2.0
27 stars 6 forks source link

Adding Coverage Ratio and quantile-based CRPS #21

Open simon-hirsch opened 3 months ago

simon-hirsch commented 3 months ago

Hi, I like this a lot and have looked jealously at R for the scoringRules package for long time. Once I've understood how the backends work in detail, I'd like to add a few scoring rules. I've worked with numba before, but not that much with (g)ufuncs.

I'd like to add the Coverage Ratio (could be based on ensembles and quantiles) and CRPS based on quantiles approximated via the pinball loss, taking the target quantile / alpha as third argument and as last axis, having something like crps_quantile(observations, forecast, alpha). Do you got any objections to this design choice?

Somewhere down the road it would be cool to have the Winkler Score and the Dawid-Sebastiani score (DSS). For the DSS, there is also a version based on the graphical lasso if one has less ensemble members than observations Wilks 2020.

Thanks and cheers, Simon

frazane commented 3 months ago

Hi Simon, thanks for reaching out! Your contribution is very welcome.

I don't have objections, feel free to go ahead. If you need help with the numba gufuncs or for any other question you can get in contact with me or @sallen12 on the slack channel. Could you just maybe provide a reference to the coverage ratio?

And yes, adding the DSS is on the roadmap. Winkler score would also be a nice addition 👍

simon-hirsch commented 3 months ago

Hi, I've added a draft pull request for the CRPS one :)

For the coverage ratio: There are a few ones here. I guess the most classic one is: For an $1 - \alpha$ prediction interval with lower, upper $L$ and $U$ and nominal coverage $P(y \in [L, U]) = 1 - \alpha$ we have an empirical coverage $1/N \sum 1_{L \leq y \leq U}$ (1 as the indicator function here).

From there people do all kinds of things like subtracting nominal and empirical coverage (often called absolute miscoverage, just showing the empirical coverage, ... here is an overview Nowotarski & Weron, 2018. I have also seen just one-sided coverage ratios based on quantiles i.e. is your q50 forecast really larger/smaller than 50% of the true values.

simon-hirsch commented 3 months ago

Follow up question I just thought about: Having upper and lower, should we just have them as one axis of the forecast array or have coverage(observations, lower, upper, alpha) ? I think I like the first one a bit more

sallen12 commented 3 months ago

Thanks for getting in touch and for adding some functionality to the package - the quantile-based estimation of the CRPS is a nice addition. The coverage is not a scoring rule, but rather a measure of forecast calibration, so I would not include it for now. We have discussed extending the package to also include checks for calibration, so that it provides a more complete toolbox with which to evaluate probabilistic forecasts. But this opens so many possible options, the coverage being just one example. We will likely come back to this later, but I think it's better if we focus on extending the scoring rule functionality first, before possibly adding metrics to assess calibration later (although I obviously agree this would be worthwhile at some point!).

simon-hirsch commented 3 months ago

Hi @sallen12, just one comment from my side - I'll need the coverage anyways for the next paper (again) and then I can also do it properly and submit the PR here. It is a pretty straightforward one to implement and essentially a byproduct. Otherwise, feel free to close this issue since the CRPS is done 👍🏽