SolarArbiter / solarforecastarbiter-core

Core data gathering, validation, processing, and reporting package for the Solar Forecast Arbiter
https://solarforecastarbiter-core.readthedocs.io
MIT License
33 stars 21 forks source link

Brier Score decomposition: REL, RES, UNC #232

Closed dplarson closed 4 years ago

dplarson commented 4 years ago

Opening this issue here (rather than the website repo) since this is more a question of code implementation that metric definition:

The Brier Score (BS) can be decomposed into three components: reliability (REL), resolution (RES) and uncertainty (UNC), where BS = REL - RES + UNC. To compute the REL and RES components, you find the K unique forecasts in the N forecasts provided (K <= N). Since the probabilistic forecasts can be provided to SolarArbiter as floating point numbers of arbitrary precision, this brings up the question: how do you determine a (finite) set of unique forecasts?

One approach is converting all forecasts to a pre-defined precision (e.g. one or two decimal places). This is seemingly the approach taken in the original paper [1], which states that the forecasts "can assume only a finite set of S distinct values". Also, in [1], the examples provided use 1-decimal precision forecasts (0.0, 0.1, 0.2, ..., 1.0), with some (apparent) rounding of intermediate calculations.

[1] Murphy (1973) "A New Vector Partition of the Probability Score", doi: https://doi.org/10.1175/1520-0450(1973)012%3C0595:ANVPOT%3E2.0.CO;2

Question

dplarson commented 4 years ago

My suggestion is that we should approximate the probabilities [-] to two decimal precision (e.g. 0.2354 => 0.24) for the REL and RES calculations only, but I'm not sure yet on the best approach to approximating the probabilities (since we still want the probabilities to sum to 1.00 and it's possible that simple rounding will result in the sum coming out as, e.g., 0.99 or 1.01).

wholmgren commented 4 years ago

Do we need to worry about the number of bins as vs. the number of forecasts/observations? For example, if N < 1000, bin by tenths, otherwise bin by hundredths?

In any case, we could follow the formulation in Stephenson et. al. "Two Extra Components in the Brier Score Decomposition".

dplarson commented 4 years ago

That's a good point and I agree we'd probably want to explicitly set some sort of convention on number of bins versus number of forecasts/evaluation. (At the very least, define what to do if there are very few forecasts/observations, e.g., << 100.)

Also, thanks for sharing that paper (I'm reading through it now).