epiforecasts / scoringutils

Utilities for Scoring and Assessing Predictions
https://epiforecasts.io/scoringutils/
Other
48 stars 20 forks source link

Create a wrapper around log score to warn about its use for integer-valued forecasts #360

Open nikosbosse opened 11 months ago

nikosbosse commented 11 months ago

In the old version, we don't compute a log score for discrete sample-based forecasts. The reason for that is that the scoringRules implementation of the log score estimates a density, which is difficult for discrete forecasts. Naturally, there could be different ways, so instead of estimating a density, you would have an actual probability assigned to every discrete value. I'm not sure how to do this (in some sense, it's the same as multiclass classification with a lot of classes?). And we don't currently have code for that.

Options seem to be

  1. compute log score for discrete forecasts anyway
  2. don't compute log score for discrete forecasts
  3. come up with some implementation
nikosbosse commented 10 months ago

@sbfnk Do you have thoughts on this?

seabbs commented 10 months ago

Make it something people can do themselves if they want and are aware of the potential issues? I.e it's not in the default list but it's mentioned you could if you like risk?

nikosbosse commented 10 months ago

Hm currently we're not distinguishing between integer and continuous in our methods - i.e. there is only one score.forecasts_sample

seabbs commented 10 months ago

I think we can push this into future nice to have releases?

sbfnk commented 10 months ago

Is there an underlying issue that representation of probability distributions (sample, quantiles, analytical etc.) and types of outcomes (continuous, binary, integer) are orthogonal and both need to be known before deciding how to score, but we sometimes make a decision based only on one of these pieces of information?

nikosbosse commented 10 months ago

I think @sbfnk you're right there is such an issue in this case (and maybe a few others, e.g. when constructing PIT histograms, or in the bias metric). Usually, I think this can be handled by the function itself. I.e. the function is called based on the representation of the probability distribution and then acts based on continuous/discrete (the binary case is handled automatically as we're treating binary as a different representation)

In this specific instance, however, the function we are using is one from scoringRules.

We could create a wrapper around scoringRules::logs_sample() which checks whether the forecasts are discrete and produce a warning in that case.

@seabbs we could push this to a future release if we're happy with users computing a log score which is only moderately appropriate. I could also live with that.

seabbs commented 10 months ago

We could create a wrapper around scoringRules::logs_sample() which checks whether the forecasts are discrete and produce a warning in that case.

This seems like a good idea as an MVP.