Naming forecasts of categorical outcomes

nikosbosse commented 8 months ago

The Hubverse elicits categorical forecasts. To integrate scoringutils with their tools, we should have a dedicated class for that and scoring rules for evaluating categorical forecasts. What should this class be named?

In the context of the following overview, we're interested in "soft multiclass prediction". The predicted value is a probability and the outcome is a factor.

This website makes the following distinction between ordered and unordered categories:

Here are some suggestions for the name of the class:

forecast_nominal
forecast_categorical
forecast_multiclass

Further thoughts:

Merging binary and nominal forecasts in one class with one set of scoring rules seems at least possible. I tend to think it's not necessarily desirable though.
We might want to include the following things in future releases, so it might be worth taking them into account
- hard classification (i.e. predicting the outcome label directly, instead of providing a probability).
- ordered categorical predictions

seabbs commented 8 months ago

It seems fairly clear that forecast_binary, forecast_ordinal and forecast_nominal make sense grouped together under the categorical banner (but as yet no meta classes so we don't need forecast_categorical?

Merging binary and nominal forecasts in one class with one set of scoring rules seems at least possible. I tend to think it's not necessarily desirable though.

Agree we should keep these apart even if under the hood they share infra. Will be confusing for users.

ordered categorical predictions

Aside from different scoring this could be done with the suggested variable structure of forecast_multiclass right but you would need an additional ordering variable?

nikosbosse commented 8 months ago

Naming the class

ok based on your comments (thanks!) here and on #608, I see the following:

There is a hierarchy:

categorical forecasts
- ordinal forecasts
- nominal forecasts
- binary forecasts (special case of nominal)

"multiclass forecasts" seem to be the same as "nominal forecasts" to me, is that right? Maybe there is a small difference in the sense that nominal forecasts comprise binary forecasts, but multiclass forecasts don't comprise binary forecasts and are instead on the same level?

Since what we want at the moment is scoring nominal/multiclass forecasts, I think we should name the class either

forecast_nominal or
forecast_multiclass

We could then add a forecast_ordinal in the future if we ever wanted to score forecasts for ordered categories (which we could then represent by ordered factors - we should make it clear that for now, we're expecting unordered factor levels).

Naming the input columns/variables

As discussed in #608, there will be 3 input columns:

predicted: numeric
observed: factor
to_be_named: factor, denoting the category for which a prediction was made.

I started to like predicted_label as a name. Having predicted_ here makes the relation to the prediction clear. label I think works well both with "label for a class or category" as well as "label of the factor level for which a prediction was made". Alternative ideas:

predicted_class
predicted_category
predicted_outcome

Again pinging @sbfnk and @nickreich in case you want to weigh in

seabbs commented 8 months ago

Can multiclass forecasts include ordinal classification? If yes then I don't think it makes sense to use but it could be used as again as a metaclass for nominal and ordinal?

I think I would marginally prefer forecast_nominal but I don't mind forecast_multiclass if others strongly prefer.

I started to like predictedlabel as a name. Having predicted here makes the relation to the prediction clear. label I think works well both with "label for a class or category" as well as "label of the factor level for which a prediction was made".

Seems like a good choice.

(ChaptGPT given "can multiclass forecasts include ordinal forecasts" thinks ordinal forecasts are a subclass of multiclass forecasts)

nikosbosse commented 8 months ago

I also like forecast_nominal. ok then I think we'll call the class forecast_nominal and the additional column predicted_label, so the input format would be

predicted: numeric
observed: factor
predicted_label: factor, denoting the category for which a prediction was made.

epiforecasts / scoringutils

Naming forecasts of categorical outcomes #607

Naming the class

Naming the input columns/variables