Scoring & data revisions

kathsherratt commented 2 years ago

We used the latest available observed data as of the 20th of August 2022. As observations are subject to revisions this means that the data used to produce forecasts for a given date may not reflect the data used for evaluation. This follows the common practice of the European forecasting hub projects in treating the latest available data as the forecast target regardless of the number of potential data revisions [@REF].

This isn't the case - we permanently remove targets from evaluation if they see any data revision at any time that moves the value by >5%. These are (poorly) documented here (with source code)

kathsherratt commented 2 years ago

In any case since you are scoring both ensemble and the new model from scratch presumably this doesn't affect the conclusions in the paper about relative comparisons.

It looks like there are 63 anomalies for incident cases in the forecast period. Maybe we could remove these with some code like the below? I'm not sure where this is best placed in the code but was looking at the merge_forecasts_with_truth() function.

  # Remove anomalies
  anomalies <- fread("https://raw.githubusercontent.com/covid19-forecast-hub-europe/covid19-forecast-hub-europe/main/data-truth/anomalies/anomalies.csv")
  anomalies <- anomalies[target_variable == "inc case"]
  forecasts_with_truth <- forecasts_with_truth[!anomalies, 
                                               on = c("location", "location_name", "target_end_date")]

seabbs commented 2 years ago

Thanks this is a good point. I couldn't work out what current practice on this was.

It doesn't though I imagine it may drive some of the extreme forecast differences (likely given the error model). IO would be keen to update to account for the forecast anomalies and add a comment in the methods and discussion raising this point.

I think perhaps it should have its function and also its own data extraction to get the anomalies from the hub.

seabbs commented 2 years ago

To clarify are you just removing them when they occur in the forecast horizon or also removing forecasts for which anomalies occur on for example the day of forecast?

seabbs commented 2 years ago

This appears to be a pretty big chunk of the available forecasts for some places (i.e Lithuania)

12:      2022-01-22       LT     Lithuania large data revision
13:      2022-01-29       LT     Lithuania large data revision
14:      2022-02-05       LT     Lithuania large data revision
15:      2022-02-12       LT     Lithuania large data revision
16:      2022-02-19       LT     Lithuania large data revision
17:      2022-02-26       LT     Lithuania large data revision
18:      2022-03-05       LT     Lithuania large data revision
19:      2022-03-12       LT     Lithuania large data revision
20:      2022-03-19       LT     Lithuania large data revision
21:      2022-03-26       LT     Lithuania large data revision
22:      2022-04-02       LT     Lithuania large data revision
23:      2022-04-09       LT     Lithuania large data revision
24:      2022-04-16       LT     Lithuania large data revision
25:      2022-04-23       LT     Lithuania large data revision
26:      2022-04-30       LT     Lithuania large data revision
27:      2022-05-07       LT     Lithuania large data revision
28:      2022-05-14       LT     Lithuania large data revision
29:      2022-05-21       LT     Lithuania large data revision
30:      2022-05-28       LT     Lithuania large data revision
31:      2022-06-04       LT     Lithuania large data revision
32:      2022-06-11       LT     Lithuania large data revision
33:      2022-06-18       LT     Lithuania large data revision
34:      2022-06-25       LT     Lithuania large data revision

seabbs commented 2 years ago

I've updated to drop anomalies from scoring (and only scoring so far) here: https://github.com/epiforecasts/simplified-forecaster-evaluation/commit/623c8994ed19fc2edb1c0da9fa002c4a35e90e01

Will read through the paper and check the results interpretation stands. Work to do close this is:

[x] Add methods section outlining anomaly removal.
[x] Add discussion of location and number of removals to the raw data results.
[x] Decide if anomalies should be removed from the metadata discussion.
[ ] Review/Update results to check interpretation.

sbfnk commented 2 years ago

To clarify are you just removing them when they occur in the forecast horizon or also removing forecasts for which anomalies occur on for example the day of forecast?

Both - we're removing any truth data which has anomalies and any forecasts made when there was an anomaly https://github.com/covid19-forecast-hub-europe/covid19-forecast-hub-europe/blob/39823425e9ea5d66c3dc0e7a55fa7ba5433d7df2/code/evaluation/load_and_score_models.r#L30

I agree that it would be worth adding this to the documentation.

sbfnk commented 2 years ago

This appears to be a pretty big chunk of the available forecasts for some places (i.e Lithuania)

Yes because the whole data set was revised on 30 June

library("readr")
library("dplyr")
library("ggplot2")

source <- "JHU"
x <- "Cases"
owner <- "epiforecasts"
repo <- "covid19-forecast-hub-europe"
path <- paste(
  "data-truth", source,
  paste0("truth_", source, "-Incident ", x, ".csv"), sep = "/"
)

shas <- list(
  revision = "a0851dc3ca5fbc631207afe03d24c694c8e51461",
  original = "76375ce9c868e0231a2db1dfcb55f29c51050888"
)

data <- lapply(shas, function(sha) {
  readr::read_csv(
    URLencode(
      paste(
        "https://raw.githubusercontent.com", owner, repo,
        sha, path, sep = "/"
      )
    ),
    show_col_types = FALSE
  )
}) |>
  bind_rows(.id = "status") |>
  filter(location == "LT") |>
  mutate(week = lubridate::floor_date(date, "week", 7)) |>
  group_by(week, status) |>
  summarise(value = sum(value), .groups = "drop") |>
  ungroup() |>
  filter(week < max(week))

ggplot(data, aes(x = week, y = value, colour = status)) +
  geom_line() +
  scale_colour_brewer("", palette = "Set1") +
  xlab("Date") + ylab("Cases") +
  theme_bw() +
  scale_y_log10()

You could avoid some of this issue by using truth data and anomalies from close to the end date of the study period.

seabbs commented 2 years ago

Both - we're removing any truth data which has anomalies and any forecasts made when there was an anomaly

Added in https://github.com/epiforecasts/simplified-forecaster-evaluation/commit/4faac722e3b288706c5fcadb45fab8f89b8dd8ae

seabbs commented 2 years ago

You could avoid some of this issue by using truth data and anomalies from close to the end date of the study period.

As the study cut-off is the 19th of July I am already in effect doing this. I've locked the data used to be from the 1st of September. On a related note is there anything in the literature you've seen about how to deal with revised epi data when evaluating models? Perhaps we should write a short note with European hub forecast data as an example (could be a good collab project).

Related commit locking the data extraction ( note the forecasts as these are assumed to be fixed at date of submission): https://github.com/epiforecasts/simplified-forecaster-evaluation/commit/dcc8319b52732cd921101631b8ecc20d296f619d

sbfnk commented 2 years ago

As the study cut-off is the 19th of July I am already in effect doing this. I've locked the data used to be from the 1st of September.

Just to be clear: in that case it would make sense to also use the anomalies file from 1 September (though not sure it makes much of a difference).

sbfnk commented 2 years ago

On a related note is there anything in the literature you've seen about how to deal with revised epi data when evaluating models? Perhaps we should write a short note with European hub forecast data as an example (could be a good collab project).

No, and I agree that it may be a bit of a gap worth addressing.

seabbs commented 2 years ago

Just to be clear: in that case it would make sense to also use the anomalies file from 1 September (though not sure it makes much of a difference).

This is what I am doing. Everything from Sept 1st vs forecasts which are assumed to fixed by date of submission.

seabbs commented 2 years ago

I am seeing something like 7.5% of forecasts being excluded and 10% of forecast dates by location having some kind of exclusion (i.e for at least one horizon). Sound about right?

seabbs commented 2 years ago

https://github.com/epiforecasts/simplified-forecaster-evaluation/commit/a8a45f33ff679d984281196a6c3dda41b1c87bdd adds a discussion of anomalous observations + mention in limitations and further work.

seabbs commented 2 years ago

Note looking at this more I think there is a very small bug in the anomaly handling code. Some locations handle forecast on different dates and so trying to work out the last forecast week by taking away two days doesn't work for all days.

epiforecasts / simplified-forecaster-evaluation

Scoring & data revisions #1