epiforecasts / scoringutils

Utilities for Scoring and Assessing Predictions
https://epiforecasts.io/scoringutils/
Other
47 stars 19 forks source link

Unhelpful errors for `summarize_scores` when supplying non-`scores` objects #804

Closed damonbayer closed 2 months ago

damonbayer commented 2 months ago

It is documented that summarize_scores expects an object of class scores. If one instead supplies a data.frame or tibble, summarize_scores attempts to run and produces an unhelpful error. Converting a data.frame or tibble into a data.table does not produce an error, even though we have not supplied a scores object.

This issue came up for me when writing scores to a tabular text file and reading them back in as tibbles to be summarized later. I was using the CRAN version of scoringutils, but have written the issue up for the development version.

library(scoringutils)
#> Note: scoringutils is currently undergoing major development changes (with an update planned for the first quarter of 2024). We would very much appreciate your opinions and feedback on what should be included in this major update: https://github.com/epiforecasts/scoringutils/discussions/333
library(tibble)
library(data.table)

ex_scores <- example_quantile %>% as_forecast() %>% score()
#> ℹ Some rows containing NA values may be removed. This is fine if not
#>   unexpected.
summarize_scores(ex_scores, by = c("model", "target_type"))
#>                    model target_type         wis overprediction underprediction
#>                   <char>      <char>       <num>          <num>           <num>
#> 1: EuroCOVIDhub-ensemble       Cases 17943.82383   10043.121943     4237.177310
#> 2: EuroCOVIDhub-baseline       Cases 28483.57465   14096.100883    10284.972826
#> 3:  epiforecasts-EpiNow2       Cases 20831.55662   11906.823030     3260.355639
#> 4: EuroCOVIDhub-ensemble      Deaths    41.42249       7.138247        4.103261
#> 5: EuroCOVIDhub-baseline      Deaths   159.40387      65.899117        2.098505
#> 6:       UMass-MechBayes      Deaths    52.65195       8.978601       16.800951
#> 7:  epiforecasts-EpiNow2      Deaths    66.64282      18.892583       15.893314
#>    dispersion        bias interval_coverage_50 interval_coverage_90
#>         <num>       <num>                <num>                <num>
#> 1: 3663.52458 -0.05640625            0.3906250            0.8046875
#> 2: 4102.50094  0.09796875            0.3281250            0.8203125
#> 3: 5664.37795 -0.07890625            0.4687500            0.7890625
#> 4:   30.18099  0.07265625            0.8750000            1.0000000
#> 5:   91.40625  0.33906250            0.6640625            1.0000000
#> 6:   26.87239 -0.02234375            0.4609375            0.8750000
#> 7:   31.85692 -0.00512605            0.4201681            0.9075630
#>    interval_coverage_deviation   ae_median
#>                          <num>       <num>
#> 1:                 -0.10230114 24101.07031
#> 2:                 -0.11721591 38473.60156
#> 3:                 -0.06963068 27923.81250
#> 4:                  0.20380682    53.13281
#> 5:                  0.12142045   233.25781
#> 6:                 -0.02488636    78.47656
#> 7:                 -0.04520244   104.74790

Try with a data.frame:

ex_scores_df <- as.data.frame(ex_scores)
summarize_scores(ex_scores_df, by = c("model", "target_type"))
#> Error in `[.data.frame`(scores, , lapply(.SD, fun, ...), by = c(by), .SDcols = colnames(scores) %like% : unused arguments (by = c(by), .SDcols = colnames(scores) %like% paste(metrics, collapse = "|"))

Convert the data.frame to a data.table:

ex_scores_df_dt <- as.data.table(ex_scores_df)
summarize_scores(ex_scores_df_dt, by = c("model", "target_type"))
#>                    model target_type         wis overprediction underprediction
#>                   <char>      <char>       <num>          <num>           <num>
#> 1: EuroCOVIDhub-ensemble       Cases 17943.82383   10043.121943     4237.177310
#> 2: EuroCOVIDhub-baseline       Cases 28483.57465   14096.100883    10284.972826
#> 3:  epiforecasts-EpiNow2       Cases 20831.55662   11906.823030     3260.355639
#> 4: EuroCOVIDhub-ensemble      Deaths    41.42249       7.138247        4.103261
#> 5: EuroCOVIDhub-baseline      Deaths   159.40387      65.899117        2.098505
#> 6:       UMass-MechBayes      Deaths    52.65195       8.978601       16.800951
#> 7:  epiforecasts-EpiNow2      Deaths    66.64282      18.892583       15.893314
#>    dispersion        bias interval_coverage_50 interval_coverage_90
#>         <num>       <num>                <num>                <num>
#> 1: 3663.52458 -0.05640625            0.3906250            0.8046875
#> 2: 4102.50094  0.09796875            0.3281250            0.8203125
#> 3: 5664.37795 -0.07890625            0.4687500            0.7890625
#> 4:   30.18099  0.07265625            0.8750000            1.0000000
#> 5:   91.40625  0.33906250            0.6640625            1.0000000
#> 6:   26.87239 -0.02234375            0.4609375            0.8750000
#> 7:   31.85692 -0.00512605            0.4201681            0.9075630
#>    interval_coverage_deviation   ae_median
#>                          <num>       <num>
#> 1:                 -0.10230114 24101.07031
#> 2:                 -0.11721591 38473.60156
#> 3:                 -0.06963068 27923.81250
#> 4:                  0.20380682    53.13281
#> 5:                  0.12142045   233.25781
#> 6:                 -0.02488636    78.47656
#> 7:                 -0.04520244   104.74790

Try with a tibble:

ex_scores_tbl <- as_tibble(ex_scores)
summarize_scores(ex_scores_tbl, by = c("model", "target_type"))
#> Error in `scores[, lapply(.SD, fun, ...), by = c(by), .SDcols = colnames(scores) %like%
#>     paste(metrics, collapse = "|")]`:
#> ! Can't subset columns with `lapply(.SD, fun, ...)`.
#> ✖ `lapply(.SD, fun, ...)` must be logical, numeric, or character, not an empty list.

Convert the tibble to a data.table:

ex_scores_tbl_dt <- as.data.table(ex_scores_tbl)
summarize_scores(ex_scores_tbl_dt, by = c("model", "target_type"))
#>                    model target_type         wis overprediction underprediction
#>                   <char>      <char>       <num>          <num>           <num>
#> 1: EuroCOVIDhub-ensemble       Cases 17943.82383   10043.121943     4237.177310
#> 2: EuroCOVIDhub-baseline       Cases 28483.57465   14096.100883    10284.972826
#> 3:  epiforecasts-EpiNow2       Cases 20831.55662   11906.823030     3260.355639
#> 4: EuroCOVIDhub-ensemble      Deaths    41.42249       7.138247        4.103261
#> 5: EuroCOVIDhub-baseline      Deaths   159.40387      65.899117        2.098505
#> 6:       UMass-MechBayes      Deaths    52.65195       8.978601       16.800951
#> 7:  epiforecasts-EpiNow2      Deaths    66.64282      18.892583       15.893314
#>    dispersion        bias interval_coverage_50 interval_coverage_90
#>         <num>       <num>                <num>                <num>
#> 1: 3663.52458 -0.05640625            0.3906250            0.8046875
#> 2: 4102.50094  0.09796875            0.3281250            0.8203125
#> 3: 5664.37795 -0.07890625            0.4687500            0.7890625
#> 4:   30.18099  0.07265625            0.8750000            1.0000000
#> 5:   91.40625  0.33906250            0.6640625            1.0000000
#> 6:   26.87239 -0.02234375            0.4609375            0.8750000
#> 7:   31.85692 -0.00512605            0.4201681            0.9075630
#>    interval_coverage_deviation   ae_median
#>                          <num>       <num>
#> 1:                 -0.10230114 24101.07031
#> 2:                 -0.11721591 38473.60156
#> 3:                 -0.06963068 27923.81250
#> 4:                  0.20380682    53.13281
#> 5:                  0.12142045   233.25781
#> 6:                 -0.02488636    78.47656
#> 7:                 -0.04520244   104.74790

Created on 2024-05-03 with reprex v2.1.0

nikosbosse commented 2 months ago

Thank you for opening this issue, @damonbayer! This is really helpful.

I see several potential ways of addressing this: 1) converting the data.frame into a data.table in summarise_scores() 2) strictly enforcing a scores object as an input.

Some context: The scores object is essentially just a data.table with an additional metrics attribute (which is just a vector of column names). The reason we need this metrics attribute is that (as opposed to the current CRAN version) users can now freely choose the names given to the metrics. summarise_scores() therefore can't rely on a set of known metric names to distinguish between a metric and a column that defines the forecast unit. The metrics attribute solves that in a somewhat clunky way, unfortunately (the only alternative I can imagine is prefixing every column name with something like metric_).

In #805, I opted for option 1) for user convenience. The updated function

The benefit is that something like the following still works (since the metrics attribute apparently is preserved):

library(scoringutils)
ex_scores <- example_quantile %>% as_forecast() %>% score()
ex_scores_df <- as.data.frame(ex_scores)
summarize_scores(ex_scores_df, by = c("model", "target_type"))

The above would be more annoying if we went with option 2) and strictly enforced a scores object as input.

I imagine your use cases (writing to a file and summarising afterwards) won't work so easily. Compare, e.g.

ex_scores_df2 <- as.data.frame(as.matrix(ex_scores))
summarize_scores(ex_scores_df2, by = c("model", "target_type"))

which doesn't work because the metrics attribute is missing.

Currently, the only option unfortunately is to set the attribute manually again via attr(scores, "metrics") <- ....

I'm very curious to hear your thoughts on this and whether you think there would be a way to improve user experience.

damonbayer commented 2 months ago

Thanks for working on this!

I'm not able to check since I'm away from a computer. What happens when the metrics attribute is missing and you try to use summarize_scores? It would be helpful if it errored, mentioned that the attribute was missing, and told the user how to add the attribute.

nikosbosse commented 2 months ago

Yes if the object is missing you'll get an error pointing you to get_metrics(). I updated the docs there to make it clearer what's happening and what the user is expected to do. Let me know in case that doesn't make sense to you.