CRPS Bug - Githubissues

mattmotoki commented 2 years ago

I believe there is a problem in the continuous_ranked_probability_score function. The problem is that the limits of integration should be from -infinity to infinity but the code is only integrating from the min of the forecast support to the max of the forecast support. This gives incorrect results when the observation lies outside of the support of the forecast. Moreover, it is possible to make the error zero by reducing the support of the forecast. A quick and dirty fix could be to manually extend the support of the forecast.

Here are some toy examples that illustrate the problem. I included the results from the properscoring package for reference.

import numpy as np
import properscoring
from scipy.stats import norm
from solarforecastarbiter.metrics.probabilistic import continuous_ranked_probability_score

obs = 1
q = np.linspace(0, 1, 1000)[1:-1]

Example 1 - Observation inside the support of the forecast (Properscoring and SFA Agree)

dist = norm(loc=obs, scale=0.1)
support = np.arange(0, 2, 0.001)
norm_prob = dist.cdf(support)
norm_preds = dist.ppf(q)

plt.figure(figsize=(10, 4))
plt.axvline(obs, color="k", linestyle=":")
plt.plot(support, norm_prob)
plt.show()

print(
    f"properscoring={properscoring.crps_quadrature(obs, dist):0.7f} ", 
    f"sfa={continuous_ranked_probability_score(np.reshape(obs, (1,)), norm_preds[None], 100*q[None]):0.7f}",
    f"(forecast_support=[{norm_preds.min():0.5f}, {norm_preds.max():0.5f}])"
)

>>>properscoring=0.0233695  sfa=0.0233696 (forecast_support=[0.69101, 1.30899])

example1

Example 2 - Observation outside the support of the forecast (Properscoring and SFA Disagree)

dist = norm(loc=0, scale=0.1)
support = np.arange(-2, 1, 0.001)
norm_prob = dist.cdf(support)
norm_preds = dist.ppf(q)

plt.figure(figsize=(10, 4))
plt.axvline(obs, color="k", linestyle=":")
plt.plot(support, norm_prob)
plt.show()

print(
    f"properscoring={properscoring.crps_quadrature(obs, dist):0.7f} ", 
    f"sfa={continuous_ranked_probability_score(np.reshape(obs, (1,)), norm_preds[None], 100*q[None]):0.7f}",
    f"(forecast_support=[{norm_preds.min():0.5f}, {norm_preds.max():0.5f}])"
)

>>>properscoring=0.4435811  sfa=0.2523160 (forecast_support=[0.19101, 0.80899])

example2

Example 3 - Forecast support is small (SFA CRPS is near zero)

dist = norm(loc=0, scale=1e-10)
support = np.arange(-1, 1, 0.001)
norm_prob = dist.cdf(support)
norm_preds = dist.ppf(q)

plt.figure(figsize=(10, 4))
plt.axvline(obs, color="k", linestyle=":")
plt.plot(support, norm_prob)
plt.show()

print(
    f"properscoring={properscoring.crps_quadrature(obs, dist):0.7f} ", 
    f"sfa={continuous_ranked_probability_score(np.reshape(obs, (1,)), norm_preds[None], 100*q[None]):0.7f}",
    f"(forecast_support=[{norm_preds.min():0.5f}, {norm_preds.max():0.5f}])"
)
>>>properscoring=1.0000000  sfa=0.0000000 (forecast_support=[-0.00000, 0.00000])

example3

mattmotoki commented 2 years ago

Here's a quick fix for a single observation. I extended the minimum and maximum of the forecast support. The corresponding probabilities are 0 and 100. I also use the trapezoidal rule rather than the rectangle rule. This gets closer to Properscoring but I think it can still be improved.

def sfa_crps_fixed(obs, fx, fx_prob):
    """Continuous Ranked Probability Score (CRPS).
        CRPS = 1/n sum_{i=1}^n int (F_i - O_i)^2 dx
    where F_i is the CDF of the forecast at time i and O_i is the CDF
    associated with the observed value obs_i:
        O_{i, j} = 1 if obs_i <= fx_{i, j}, else O_{i, j} = 0
    where obs_i is the observation at time i, and fx_{i, j} is the forecast at
    time i for CDF interval j.
    Parameters
    ----------
    obs : (n,) array_like
        Observations (physical unit).
    fx : (n, d) array_like
        Forecasts (physical units) of the right-hand-side of a CDF with d
        intervals (d >= 2), e.g., fx = [10 MW, 20 MW, 30 MW] is interpreted as
        <= 10 MW, <= 20 MW, <= 30 MW.
    fx_prob : (n, d) array_like
        Probability [%] associated with the forecasts.
    Returns
    -------
    crps : float
        The Continuous Ranked Probability Score [unitless].
    Raises
    ------
    ValueError
        If the forecasts have incorrect dimensions; either a) the forecasts are
        for a single sample (n=1) with d CDF intervals but are given as a 1D
        array with d values or b) the forecasts are given as 2D arrays (n,d)
        but do not contain at least 2 CDF intervals (i.e. d < 2).
    Examples
    --------
    Forecast probabilities of <= 10 MW and <= 20 MW:
    >>> fx = np.array([[10, 20], [10, 20]])
    >>> fx_prob = np.array([[30, 50], [65, 100]])
    >>> obs = np.array([8, 12])
    >>> continuous_ranked_probability_score(obs, fx, fx_prob)
    4.5625
    Forecast thresholds for constant probabilities (25%, 75%):
    >>> fx = np.array([[5, 15], [8, 14]])
    >>> fx_prob = np.array([[25, 75], [25, 75]])
    >>> obs = np.array([8, 10])
    >>> continuous_ranked_probability_score(obs, fx, fx_prob)
    0.5
    """

    # match observations to fx shape: (n,) => (n, d)
    if np.ndim(fx) < 2:
        raise ValueError("forecasts must be 2D arrays (expected (n,d), got"
                         f"{np.shape(fx)})")
    elif np.shape(fx)[1] < 2:
        raise ValueError("forecasts must have d >= 2 CDF intervals "
                         f"(expected >= 2, got {np.shape(fx)[1]})")
    else:

        # extend min forecast
        fx = np.hstack([np.minimum(obs, fx[:,[0]]), fx])
        fx_prob = np.hstack([np.zeros((len(fx_prob), 1)), fx_prob])

        # extend max forecast
        fx = np.hstack([fx, np.maximum(obs, fx[:,[-1]])])
        fx_prob = np.hstack([fx_prob, np.full((len(fx_prob), 1), 100)])

        # is this necessary? (broadcasting)
        obs = np.tile(obs, (fx.shape[1], 1)).T

    # event: 0=did not happen, 1=did happen
    o = np.where(obs < fx, 1.0, 0.0)

    # forecast probabilities [unitless]
    f = fx_prob / 100.0

    # integrate along each sample, then average all samples
    integrand = (f - o) ** 2
    dx = np.diff(fx, axis=1)
    crps = np.mean(np.sum((integrand[:, :-1] + integrand[:, 1:])/2 * dx, axis=1))
    return crps

Example 1 - Observation inside the support of the forecast (Properscoring and SFA Agree)

dist = norm(loc=obs, scale=0.1)
support = np.arange(0, 2, 0.001)
norm_prob = dist.cdf(support)
norm_preds = dist.ppf(q)

print(
    f"properscoring={properscoring.crps_quadrature(obs, dist):0.7f} ", 
    f"sfa={continuous_ranked_probability_score(np.reshape(obs, (1,)), norm_preds[None], 100*q[None]):0.7f}",
    f"sfa_fixed={sfa_crps_fixed(np.reshape(obs, (1,)), norm_preds[None], 100*q[None]):0.7f}",
    f"(forecast_support=[{norm_preds.min():0.5f}, {norm_preds.max():0.5f}])"
)

>>>properscoring=0.0233695  sfa=0.0233696 sfa_fixed=0.0233696 (forecast_support=[0.69101, 1.30899])

Example 2 - Observation outside the support of the forecast (Properscoring and SFA Disagree)

dist = norm(loc=0.5, scale=0.1)
support = np.arange(-1, 2, 0.001)
norm_prob = dist.cdf(support)
norm_preds = dist.ppf(q)

print(
    f"properscoring={properscoring.crps_quadrature(obs, dist):0.7f} ", 
    f"sfa={continuous_ranked_probability_score(np.reshape(obs, (1,)), norm_preds[None], 100*q[None]):0.7f}",
    f"sfa_fixed={sfa_crps_fixed(np.reshape(obs, (1,)), norm_preds[None], 100*q[None]):0.7f}",
    f"(forecast_support=[{norm_preds.min():0.5f}, {norm_preds.max():0.5f}])"
)

>>>properscoring=0.4435811  sfa=0.2523160 sfa_fixed=0.4434407 (forecast_support=[0.19101, 0.80899])

Example 3 - Forecast support is small (SFA CRPS is near zero)

dist = norm(loc=0, scale=1e-10)
support = np.arange(-1, 1, 0.001)
norm_prob = dist.cdf(support)
norm_preds = dist.ppf(q)

print(
    f"properscoring={properscoring.crps_quadrature(obs, dist):0.7f} ", 
    f"sfa={continuous_ranked_probability_score(np.reshape(obs, (1,)), norm_preds[None], 100*q[None]):0.7f}",
    f"sfa_fixed={sfa_crps_fixed(np.reshape(obs, (1,)), norm_preds[None], 100*q[None]):0.7f}",
    f"(forecast_support=[{norm_preds.min():0.5f}, {norm_preds.max():0.5f}])"
)

>>>properscoring=1.0000000  sfa=0.0000000 sfa_fixed=0.9989995 (forecast_support=[-0.00000, 0.00000])

wholmgren commented 2 years ago

@dplarson?

dplarson commented 2 years ago

@mattmotoki Thanks for bringing this up and the detailed comments. I think it's a valid issue (though fortunately seems to be more of an "edge case" than the CRPS being completely invalid) and I'll coordinate with the rest of the SFA team to get a PR to address the issue.

mattmotoki commented 2 years ago

@dplarson Sure no problem. Thanks for the fast response. I agree, it is more of an edge case. Thank you and everyone else for developing the SFA. It's well-made and I'm enjoying using it.

mattmotoki commented 2 years ago

Hi @dplarson I was wondering if we can expect to see a fix by the start of the submission phase of the Solar Forecasting Prize Competition.

This is an edge case, but I think it's an important one. I'm worried that the current implementation of continuous_ranked_probability_score will not properly score submissions. If I'm not mistaken, forecasts that have obs not within the support (obs < min(fx) or obs > max(fx)) will have a better score than should be. Competitors can try to exploit this bug by submitting forecasts with smaller support. The extreme case is submitting a forecast with a single fx value repeated for all fx_prob. In this case, the current implementation will give a CRPS of 0 (a perfect score). I ran a test report to verify this.

If it helps, I can help contribute to the PR. Please let me know.

dplarson commented 2 years ago

@mattmotoki thanks for checking in. I've been coordinating via email with other SFA team members (to validate the CRPS logic given how SFA treats the probabilistic forecasts), which took more time than we originally expected. But I'm now moving onto finishing the PR (aiming to be done today) so that the revised CRPS is deployed before the competition start.

mattmotoki commented 2 years ago

Awesome, that's good to hear!

dplarson commented 2 years ago

As an update: the revised CRPS calculation now successfully reproduces the three examples in this issue (i.e., SFA CRPS agrees with properscoring CRPS) and the other examples I've tested on. Now I should just need to make sure all the related unit tests pass (e.g., metrics.calculator tests).

dplarson commented 2 years ago

Here's the (simplified) code to illustrate how the revised CRPS calculation performs on the three examples:

import numpy as np
import properscoring
from scipy.stats import norm
from solarforecastarbiter.metrics.probabilistic import continuous_ranked_probability_score

obs = 1
q = np.linspace(0, 1, 1000)[1:-1]

# Example 1 - Observation inside the support of the forecast (Properscoring and SFA Agree)
dist = norm(loc=obs, scale=0.1)
support = np.arange(0, 2, 0.001)
norm_prob = dist.cdf(support)
norm_preds = dist.ppf(q)

print(
    f"properscoring={properscoring.crps_quadrature(obs, dist):0.7f} ", 
    f"sfa={continuous_ranked_probability_score(np.reshape(obs, (1,)), norm_preds[None], 100*q[None]):0.7f}",
    f"(forecast_support=[{norm_preds.min():0.5f}, {norm_preds.max():0.5f}])"
)

>>>properscoring=0.0233695  sfa=0.0233696 (forecast_support=[0.69101, 1.30899])

# Example 2 - Observation outside the support of the forecast (Properscoring and SFA Disagree)
dist = norm(loc=0, scale=0.1)
support = np.arange(-2, 1, 0.001)
norm_prob = dist.cdf(support)
norm_preds = dist.ppf(q)

print(
    f"properscoring={properscoring.crps_quadrature(obs, dist):0.7f} ", 
    f"sfa={continuous_ranked_probability_score(np.reshape(obs, (1,)), norm_preds[None], 100*q[None]):0.7f}",
    f"(forecast_support=[{norm_preds.min():0.5f}, {norm_preds.max():0.5f}])"
)

>>>properscoring=0.9435810  sfa=0.9429405 (forecast_support=[-0.30899, 0.30899])

# Example 3 - Forecast support is small (SFA CRPS is near zero)
dist = norm(loc=0, scale=1e-10)
support = np.arange(-1, 1, 0.001)
norm_prob = dist.cdf(support)
norm_preds = dist.ppf(q)

print(
    f"properscoring={properscoring.crps_quadrature(obs, dist):0.7f} ", 
    f"sfa={continuous_ranked_probability_score(np.reshape(obs, (1,)), norm_preds[None], 100*q[None]):0.7f}",
    f"(forecast_support=[{norm_preds.min():0.5f}, {norm_preds.max():0.5f}])"
)

>>>properscoring=1.0000000  sfa=0.9989995 (forecast_support=[-0.00000, 0.00000])

SolarArbiter / solarforecastarbiter-core

CRPS Bug #779

Example 1 - Observation inside the support of the forecast (Properscoring and SFA Agree)

Example 2 - Observation outside the support of the forecast (Properscoring and SFA Disagree)

Example 3 - Forecast support is small (SFA CRPS is near zero)

Example 1 - Observation inside the support of the forecast (Properscoring and SFA Agree)

Example 2 - Observation outside the support of the forecast (Properscoring and SFA Disagree)

Example 3 - Forecast support is small (SFA CRPS is near zero)