epiforecasts / scoringutils

Utilities for Scoring and Assessing Predictions
https://epiforecasts.io/scoringutils/
Other
48 stars 21 forks source link

Extension for scoring multivariate forecasts #288

Open nikosbosse opened 1 year ago

nikosbosse commented 1 year ago

Scoringutils currently treats timeseries as independent when we score them. When you score forecasts for different locations, timeseries are correlated in a way that we don't take into account.

Examples from this paper: https://www.biorxiv.org/content/10.1101/104000v1.full

(Copied from #358) An implementation that would allow us to do multivariate scoring would likely need something like this:

sbfnk commented 1 year ago

This would be most straightforward to do with the sample representation (as we currently don't support quantile represenations for multivariate distributions). However, it would require a change in design to identify the sample ID (for each model) with a single unit of forecasting rather than a single row in the data frame that's being scored.

nikosbosse commented 1 year ago

I'm not sure I understand the change about the sample ID. Currently, if the unit of a single forecast is for example model, target and date, then we currently have something like this:

sample model date
1 A 2023-01-01
2 A 2023-01-01
3 A 2023-01-01
1 B 2023-01-01
2 B 2023-01-01
3 B 2023-01-01
1 A 2023-01-02
2 A 2023-01-02
3 A 2023-01-02
1 B 2023-01-02
2 B 2023-01-02
3 B 2023-01-02
From this you could automatically create something like sample model date grouping id
1 A 2023-01-01 1
2 A 2023-01-01 1
3 A 2023-01-01 1
1 B 2023-01-01 2
2 B 2023-01-01 2
3 B 2023-01-01 2
1 A 2023-01-02 3
2 A 2023-01-02 3
3 A 2023-01-02 3
1 B 2023-01-02 4
2 B 2023-01-02 4
3 B 2023-01-02 4

But that's not exactly what you mean, is it?

sbfnk commented 1 year ago

What I mean is that we're currently treating

sample model date geography
1 A 2023-01-01 Saturn
1 A 2023-01-01 Jupiter

as two units of forecasting scored separately, whereas I would suggest that anything with the same sample given by one model should be treated as a single forecasting unit and scored with multivariate methods where available.

Related discussion happening here by the way: https://github.com/Infectious-Disease-Modeling-Hubs/schemas/issues/48#issuecomment-1632814838

nikosbosse commented 1 year ago

Ah but you'd have to impose that structure manually, right? If, for example, there were different forecast dates as well, than you'd have to tell it somehow to treat the locations as one id (as forecasts were created all together and might be correlated), but forecast dates separately (as forecasts were made at different points in time). So something like this:

sample model forecast date Planet grouping id
1 A 2023-01-01 Saturn 1
2 A 2023-01-01 Saturn 2
1 A 2023-01-01 Jupyter 1
2 A 2023-01-01 Jupyter 2
1 B 2023-01-01 Saturn 3
2 B 2023-01-01 Saturn 4
1 B 2023-01-01 Jupyter 3
2 B 2023-01-01 Jupyter 4
1 A 2023-01-02 Saturn 5
2 A 2023-01-02 Saturn 6
1 A 2023-01-02 Jupyter 5
2 A 2023-01-02 Jupyter 6
1 B 2023-01-02 Saturn 7
2 B 2023-01-02 Saturn 8
1 B 2023-01-02 Jupyter 7
2 B 2023-01-02 Jupyter 8

Or would you also ignore forecast dates and always lump the samples together? And taking samples with corresponding id together would assume that the different locations are modelled jointly, right?

The easiest thing I could imagine would be a function that asks you to do that grouping and then creates an id column based on that and you could then use that for scoring.

sbfnk commented 1 year ago

So perhaps something like a group_by option for score that does multivariate scoring for common sample ids across them? In the case of your example instead of score(df) you'd do score(df, group_by = "Planet") and a rows that differ only by Planet with all else equal would be combined into a single unit of forecasting.

sbfnk commented 1 year ago

There might be better names for this, e.g. score(df, multivariate = "Planet") might be clearer.

seabbs commented 1 year ago

Like it. I agree that specifying the grouping makes sense. For naming could stick with by used in summarise_scores() (which was always justified based on data.table.

nikosbosse commented 1 year ago

We might want to have a separate function like score_multivariate() (depending on how many metrics would be different). It might be a bit confusing to have an extra by argument in score which is only used in some specific instances. We could also have a function set_multivariate_grouping() or an equivalent with a less terrible name which creates a grouping variable in a step before scoring (analogously to set_forecast_unit(). You'd then have something like

data |> 
  set_multivariate_grouping(c("Planet", "forecast_date") |> 
  score_multivariate() # or just score()
nikosbosse commented 11 months ago

related to #358

nikosbosse commented 11 months ago

Copied from the closed issue #358:

Seb created a reprex based on matrices and the next step would then be to look at a data.frame version of that

library("scoringRules")
example(es_sample)
#> 
#> es_smp> d <- 10  # number of dimensions
#> 
#> es_smp> m <- 50  # number of samples from multivariate forecast distribution
#> 
#> es_smp> # parameters for multivariate normal example
#> es_smp> mu0 <- rep(0, d)
#> 
#> es_smp> mu <- rep(1, d)
#> 
#> es_smp> S0 <- S <- diag(d)
#> 
#> es_smp> S0[S0==0] <- 0.2
#> 
#> es_smp> S[S==0] <- 0.1
#> 
#> es_smp> # generate samples from multivariate normal distributions
#> es_smp> obs <- drop(mu0 + rnorm(d) %*% chol(S0))
#> 
#> es_smp> fc_sample <- replicate(m, drop(mu + rnorm(d) %*% chol(S)))
#> 
#> es_smp> # compute Energy Score
#> es_smp> es_sample(y = obs, dat = fc_sample)
#> [1] 3.700608
#> 
#> es_smp> # in the univariate case, Energy Score and CRPS are the same
#> es_smp> # illustration: Evaluate forecast sample for the first variable
#> es_smp> es_sample(y = obs[1], dat = fc_sample[1, , drop = FALSE])
#> [1] 0.3226259
#> 
#> es_smp> crps_sample(y = obs[1], dat = fc_sample[1, ])
#> [1] 0.3226259
#> 
#> es_smp> # illustration of observation weights for Energy Score
#> es_smp> # example: equal weights for first half of draws; zero weights for other draws
#> es_smp> w <- rep(c(1, 0), each = .5*m)/(.5*m)
#> 
#> es_smp> es_sample(y = obs, dat = fc_sample, w = w)
#> [1] 3.847658
#> 
#> es_smp> # weighting matrix for variogram score
#> es_smp> # note that, unlike for w, weights in w_vs refer to dimensions 
#> es_smp> # (rows of dat) rather than draws (cols of dat)
#> es_smp> w_vs <- outer(1:d, 1:d, function(x, y) .5^abs(x-y))
#> 
#> es_smp> vs_sample(y = obs, dat = fc_sample) 
#> [1] 12.15482
#> 
#> es_smp> vs_sample(y = obs, dat = fc_sample, w_vs = w_vs) 
#> [1] 1.852571
#> 
#> es_smp> vs_sample(y = obs, dat = fc_sample, w_vs = w_vs, p = 1)
#> [1] 7.993775
seabbs commented 3 months ago

Do we still want to create a multivariate class or to have multivariate scoring rules that can be used in i.e scores.forecast_sample?

Grouping would then be on a score level (so you would set it with purrr::partial in the current framework). On the face of it it looks like you could do this here: https://github.com/epiforecasts/scoringutils/blob/4d9ecf757c7fa76e0f95fbc86083aaf439bde5f2/R/score.R#L170

This would mean you could have say multiple versions of the energy score with different groupings at once

seabbs commented 3 months ago

Hmm looking at it again maybe it wouldn't be so easy