epiforecasts / EpiSoon

Forecasting the effective reproduction number over short timescales
https://epiforecasts.io/EpiSoon/
Other
7 stars 3 forks source link

Error for compare_timeseries: Element 1 must be a vector, not NULL #20

Closed nikosbosse closed 4 years ago

nikosbosse commented 4 years ago

obs_rts_aus <- readRDS("obs_rts_aus.rds") obs_cases_aus <- readRDS("obs_cases_aus.rds") models <- list("AR 1" = function(ss, y){bsts::AddAr(ss, y = y, lags = 1)})

forecast_eval <- compare_timeseries(obs_rts = obs_rts_aus, obs_cases = obs_cases_aus, serial_interval = EpiNow::covid_serial_intervals, models = models, horizon = 7, samples = 20)

nikosbosse commented 4 years ago

Further down the line:

a <- obs_rts_aus %>%
  dplyr::filter(timeseries == "australia") %>%
  dplyr::select(-timeseries)
b <- obs_cases_aus %>%
  dplyr::filter(timeseries == "australia") %>%
  dplyr::select(-timeseries)
EpiSoon::compare_models(obs_rts = a, 
               obs_cases = b, 
               serial_interval = EpiNow::covid_serial_intervals,
               models = models, 
               horizon = 7, samples = 10)

--> same error as above

EpiSoon::evaluate_model(obs_rts = a, 
                        obs_cases = b, 
                        model = models[[1]], 
                        serial_interval = EpiNow::covid_serial_intervals)

Error in check_sample2(input) : Incompatible input objects. Expected input for (y, dat): vector[1:n], matrix[1:n, 1:m]. Given input for (y, dat): vector[1:8], matrix[1:7, 1:10].

nikosbosse commented 4 years ago

also experienced evaluate_model to be very slow once (timeout-ish) and cancelled manually.

nikosbosse commented 4 years ago

Within evaluate_model(), the error appears in

 ## Score for each forecast
  score_cases <-
    purrr::map2_dfr(case_predictions, obs_cases,
                    function(sample, obs) {
                      dplyr::group_split(sample, forecast_date) %>%
                        setNames(unique(sample$forecast_date)) %>%
                        purrr::map_dfr(~ dplyr::select(., -forecast_date) %>%
                                         score_case_forecast(obs), .id = "forecast_date")
                    }, .id = "sample")
seabbs commented 4 years ago

Hmmm - can you send me your example data on slack.

Also you are lucky this isn't stack overflow :smile:

nikosbosse commented 4 years ago

Also: this works fine

EpiSoon::evaluate_model(obs_rts = a[1:10, ], 
                        obs_cases = b[1:10, ], 
                        model = models[[1]], 
                        serial_interval = EpiNow::covid_serial_intervals, 
                        horizon = 7, samples = 10)
seabbs commented 4 years ago

So the issue is that scoring expects the observed and forecast data to be of the same length and here it is not. There should be handling of this but apparently it is not working well enough.

Here is the internal check.

  observations <- observations %>%
    dplyr::filter(
      date >= min(fit_samples$date),
      date <= max(fit_samples$date)
    )

To debug this effectively debug into score_forecast. If that isn't helping debug into the overall function and step through each list item to see which one is causing the problem.

nikosbosse commented 4 years ago

Edit: probably not true. Don't know what happened, problem disappeared again

hm maybe the culprit is this part in evaluate_model:

  ## Limit case predictions to observed data
  case_predictions <- case_predictions %>%
    dplyr::group_split(obs_sample) %>%
    purrr::map(~ dplyr::select(., -obs_sample)) %>%
    purrr::map2(obs_cases,
                ~ dplyr::filter(.x, date <= max(.y$date)))

that outputs an empty data.frame. will check what's going on.

nikosbosse commented 4 years ago

Soooo. I found more things. The problem occurs in this part in evaluate_model():

  ## Score for each forecast
  score_cases <-
    purrr::map2_dfr(case_predictions, obs_cases,
                    function(sample, obs) {
                      dplyr::group_split(sample, forecast_date) %>%
                        setNames(unique(sample$forecast_date)) %>%
                        purrr::map_dfr(~ dplyr::select(., -forecast_date) %>%
                                         score_forecast(obs), .id = "forecast_date")
                    }, .id = "sample")

I therefore made a loop that supposedly does the same thing as the above map function:

t <- dplyr::group_split(case_predictions[[1]], forecast_date) %>%
    setNames(unique(case_predictions[[1]]$forecast_date))

for (i in 1:length(t)) {
  t[[i]]  %>% 
    dplyr::select(-forecast_date) %>% 
    score_forecast(obs_cases[[1]])
  cat("success for", i, "\n")
}

The error appears for the 18th time point. While all the dates are included in the observed cases, for some weird reason in the case predictions, there are dates missing.

ds <- t[[18]]$date %>% unique ## here is a gap in the dates
obs_cases[[1]]$date %>% unique

While we could make the code run by replacing the <= in score_forecast() by %in% for filtering the dates, I think there must be a logical error in the predict_cases function that produces this gap.

seabbs commented 4 years ago

Fixed with updated implementation.