epiforecasts / scoringutils

Utilities for Scoring and Assessing Predictions
https://epiforecasts.io/scoringutils/
Other
48 stars 20 forks source link

Naming forecasts of categorical outcomes #607

Closed nikosbosse closed 8 months ago

nikosbosse commented 8 months ago

The Hubverse elicits categorical forecasts. To integrate scoringutils with their tools, we should have a dedicated class for that and scoring rules for evaluating categorical forecasts. What should this class be named?

In the context of the following overview, we're interested in "soft multiclass prediction". The predicted value is a probability and the outcome is a factor.

image

This website makes the following distinction between ordered and unordered categories:

image

Here are some suggestions for the name of the class:

Further thoughts:

seabbs commented 8 months ago

It seems fairly clear that forecast_binary, forecast_ordinal and forecast_nominal make sense grouped together under the categorical banner (but as yet no meta classes so we don't need forecast_categorical?

Merging binary and nominal forecasts in one class with one set of scoring rules seems at least possible. I tend to think it's not necessarily desirable though.

Agree we should keep these apart even if under the hood they share infra. Will be confusing for users.

ordered categorical predictions

Aside from different scoring this could be done with the suggested variable structure of forecast_multiclass right but you would need an additional ordering variable?

nikosbosse commented 8 months ago

Naming the class

ok based on your comments (thanks!) here and on #608, I see the following:

There is a hierarchy:

"multiclass forecasts" seem to be the same as "nominal forecasts" to me, is that right? Maybe there is a small difference in the sense that nominal forecasts comprise binary forecasts, but multiclass forecasts don't comprise binary forecasts and are instead on the same level?

Since what we want at the moment is scoring nominal/multiclass forecasts, I think we should name the class either

We could then add a forecast_ordinal in the future if we ever wanted to score forecasts for ordered categories (which we could then represent by ordered factors - we should make it clear that for now, we're expecting unordered factor levels).

Naming the input columns/variables

As discussed in #608, there will be 3 input columns:

I started to like predicted_label as a name. Having predicted_ here makes the relation to the prediction clear. label I think works well both with "label for a class or category" as well as "label of the factor level for which a prediction was made". Alternative ideas:

Again pinging @sbfnk and @nickreich in case you want to weigh in

seabbs commented 8 months ago

Can multiclass forecasts include ordinal classification? If yes then I don't think it makes sense to use but it could be used as again as a metaclass for nominal and ordinal?

I think I would marginally prefer forecast_nominal but I don't mind forecast_multiclass if others strongly prefer.

I started to like predictedlabel as a name. Having predicted here makes the relation to the prediction clear. label I think works well both with "label for a class or category" as well as "label of the factor level for which a prediction was made".

Seems like a good choice.

(ChaptGPT given "can multiclass forecasts include ordinal forecasts" thinks ordinal forecasts are a subclass of multiclass forecasts)

nikosbosse commented 8 months ago

I also like forecast_nominal. ok then I think we'll call the class forecast_nominal and the additional column predicted_label, so the input format would be