Open nikosbosse opened 1 year ago
This would be most straightforward to do with the sample representation (as we currently don't support quantile represenations for multivariate distributions). However, it would require a change in design to identify the sample ID (for each model) with a single unit of forecasting rather than a single row in the data frame that's being scored.
I'm not sure I understand the change about the sample ID. Currently, if the unit of a single forecast is for example model, target and date, then we currently have something like this:
sample | model | date |
---|---|---|
1 | A | 2023-01-01 |
2 | A | 2023-01-01 |
3 | A | 2023-01-01 |
1 | B | 2023-01-01 |
2 | B | 2023-01-01 |
3 | B | 2023-01-01 |
1 | A | 2023-01-02 |
2 | A | 2023-01-02 |
3 | A | 2023-01-02 |
1 | B | 2023-01-02 |
2 | B | 2023-01-02 |
3 | B | 2023-01-02 |
From this you could automatically create something like | sample | model | date | grouping id |
---|---|---|---|---|
1 | A | 2023-01-01 | 1 | |
2 | A | 2023-01-01 | 1 | |
3 | A | 2023-01-01 | 1 | |
1 | B | 2023-01-01 | 2 | |
2 | B | 2023-01-01 | 2 | |
3 | B | 2023-01-01 | 2 | |
1 | A | 2023-01-02 | 3 | |
2 | A | 2023-01-02 | 3 | |
3 | A | 2023-01-02 | 3 | |
1 | B | 2023-01-02 | 4 | |
2 | B | 2023-01-02 | 4 | |
3 | B | 2023-01-02 | 4 |
But that's not exactly what you mean, is it?
What I mean is that we're currently treating
sample | model | date | geography |
---|---|---|---|
1 | A | 2023-01-01 | Saturn |
1 | A | 2023-01-01 | Jupiter |
as two units of forecasting scored separately, whereas I would suggest that anything with the same sample
given by one model should be treated as a single forecasting unit and scored with multivariate methods where available.
Related discussion happening here by the way: https://github.com/Infectious-Disease-Modeling-Hubs/schemas/issues/48#issuecomment-1632814838
Ah but you'd have to impose that structure manually, right? If, for example, there were different forecast dates as well, than you'd have to tell it somehow to treat the locations as one id (as forecasts were created all together and might be correlated), but forecast dates separately (as forecasts were made at different points in time). So something like this:
sample | model | forecast date | Planet | grouping id |
---|---|---|---|---|
1 | A | 2023-01-01 | Saturn | 1 |
2 | A | 2023-01-01 | Saturn | 2 |
1 | A | 2023-01-01 | Jupyter | 1 |
2 | A | 2023-01-01 | Jupyter | 2 |
1 | B | 2023-01-01 | Saturn | 3 |
2 | B | 2023-01-01 | Saturn | 4 |
1 | B | 2023-01-01 | Jupyter | 3 |
2 | B | 2023-01-01 | Jupyter | 4 |
1 | A | 2023-01-02 | Saturn | 5 |
2 | A | 2023-01-02 | Saturn | 6 |
1 | A | 2023-01-02 | Jupyter | 5 |
2 | A | 2023-01-02 | Jupyter | 6 |
1 | B | 2023-01-02 | Saturn | 7 |
2 | B | 2023-01-02 | Saturn | 8 |
1 | B | 2023-01-02 | Jupyter | 7 |
2 | B | 2023-01-02 | Jupyter | 8 |
Or would you also ignore forecast dates and always lump the samples together? And taking samples with corresponding id together would assume that the different locations are modelled jointly, right?
The easiest thing I could imagine would be a function that asks you to do that grouping and then creates an id column based on that and you could then use that for scoring.
So perhaps something like a group_by
option for score
that does multivariate scoring for common sample ids across them? In the case of your example instead of score(df)
you'd do score(df, group_by = "Planet")
and a rows that differ only by Planet with all else equal would be combined into a single unit of forecasting.
There might be better names for this, e.g. score(df, multivariate = "Planet")
might be clearer.
Like it. I agree that specifying the grouping makes sense. For naming could stick with by
used in summarise_scores()
(which was always justified based on data.table
.
We might want to have a separate function like score_multivariate()
(depending on how many metrics would be different). It might be a bit confusing to have an extra by argument in score which is only used in some specific instances.
We could also have a function set_multivariate_grouping()
or an equivalent with a less terrible name which creates a grouping variable in a step before scoring (analogously to set_forecast_unit()
. You'd then have something like
data |>
set_multivariate_grouping(c("Planet", "forecast_date") |>
score_multivariate() # or just score()
related to #358
Copied from the closed issue #358:
Seb created a reprex based on matrices and the next step would then be to look at a data.frame version of that
library("scoringRules")
example(es_sample)
#>
#> es_smp> d <- 10 # number of dimensions
#>
#> es_smp> m <- 50 # number of samples from multivariate forecast distribution
#>
#> es_smp> # parameters for multivariate normal example
#> es_smp> mu0 <- rep(0, d)
#>
#> es_smp> mu <- rep(1, d)
#>
#> es_smp> S0 <- S <- diag(d)
#>
#> es_smp> S0[S0==0] <- 0.2
#>
#> es_smp> S[S==0] <- 0.1
#>
#> es_smp> # generate samples from multivariate normal distributions
#> es_smp> obs <- drop(mu0 + rnorm(d) %*% chol(S0))
#>
#> es_smp> fc_sample <- replicate(m, drop(mu + rnorm(d) %*% chol(S)))
#>
#> es_smp> # compute Energy Score
#> es_smp> es_sample(y = obs, dat = fc_sample)
#> [1] 3.700608
#>
#> es_smp> # in the univariate case, Energy Score and CRPS are the same
#> es_smp> # illustration: Evaluate forecast sample for the first variable
#> es_smp> es_sample(y = obs[1], dat = fc_sample[1, , drop = FALSE])
#> [1] 0.3226259
#>
#> es_smp> crps_sample(y = obs[1], dat = fc_sample[1, ])
#> [1] 0.3226259
#>
#> es_smp> # illustration of observation weights for Energy Score
#> es_smp> # example: equal weights for first half of draws; zero weights for other draws
#> es_smp> w <- rep(c(1, 0), each = .5*m)/(.5*m)
#>
#> es_smp> es_sample(y = obs, dat = fc_sample, w = w)
#> [1] 3.847658
#>
#> es_smp> # weighting matrix for variogram score
#> es_smp> # note that, unlike for w, weights in w_vs refer to dimensions
#> es_smp> # (rows of dat) rather than draws (cols of dat)
#> es_smp> w_vs <- outer(1:d, 1:d, function(x, y) .5^abs(x-y))
#>
#> es_smp> vs_sample(y = obs, dat = fc_sample)
#> [1] 12.15482
#>
#> es_smp> vs_sample(y = obs, dat = fc_sample, w_vs = w_vs)
#> [1] 1.852571
#>
#> es_smp> vs_sample(y = obs, dat = fc_sample, w_vs = w_vs, p = 1)
#> [1] 7.993775
Do we still want to create a multivariate class or to have multivariate scoring rules that can be used in i.e scores.forecast_sample?
Grouping would then be on a score level (so you would set it with purrr::partial
in the current framework). On the face of it it looks like you could do this here: https://github.com/epiforecasts/scoringutils/blob/4d9ecf757c7fa76e0f95fbc86083aaf439bde5f2/R/score.R#L170
This would mean you could have say multiple versions of the energy score with different groupings at once
Hmm looking at it again maybe it wouldn't be so easy
Scoringutils currently treats timeseries as independent when we score them. When you score forecasts for different locations, timeseries are correlated in a way that we don't take into account.
Examples from this paper: https://www.biorxiv.org/content/10.1101/104000v1.full
(Copied from #358) An implementation that would allow us to do multivariate scoring would likely need something like this:
score.multivariate
method or something like this