Open nikosbosse opened 1 year ago
@sbfnk Do you have thoughts on this?
Make it something people can do themselves if they want and are aware of the potential issues? I.e it's not in the default list but it's mentioned you could if you like risk?
Hm currently we're not distinguishing between integer and continuous in our methods - i.e. there is only one score.forecasts_sample
I think we can push this into future nice to have releases?
Is there an underlying issue that representation of probability distributions (sample, quantiles, analytical etc.) and types of outcomes (continuous, binary, integer) are orthogonal and both need to be known before deciding how to score, but we sometimes make a decision based only on one of these pieces of information?
I think @sbfnk you're right there is such an issue in this case (and maybe a few others, e.g. when constructing PIT histograms, or in the bias metric). Usually, I think this can be handled by the function itself. I.e. the function is called based on the representation of the probability distribution and then acts based on continuous/discrete (the binary case is handled automatically as we're treating binary as a different representation)
In this specific instance, however, the function we are using is one from scoringRules
.
We could create a wrapper around scoringRules::logs_sample()
which checks whether the forecasts are discrete and produce a warning in that case.
@seabbs we could push this to a future release if we're happy with users computing a log score which is only moderately appropriate. I could also live with that.
We could create a wrapper around scoringRules::logs_sample() which checks whether the forecasts are discrete and produce a warning in that case.
This seems like a good idea as an MVP.
In the old version, we don't compute a log score for discrete sample-based forecasts. The reason for that is that the
scoringRules
implementation of the log score estimates a density, which is difficult for discrete forecasts. Naturally, there could be different ways, so instead of estimating a density, you would have an actual probability assigned to every discrete value. I'm not sure how to do this (in some sense, it's the same as multiclass classification with a lot of classes?). And we don't currently have code for that.Options seem to be