Closed nikosbosse closed 4 years ago
Further down the line:
a <- obs_rts_aus %>%
dplyr::filter(timeseries == "australia") %>%
dplyr::select(-timeseries)
b <- obs_cases_aus %>%
dplyr::filter(timeseries == "australia") %>%
dplyr::select(-timeseries)
EpiSoon::compare_models(obs_rts = a,
obs_cases = b,
serial_interval = EpiNow::covid_serial_intervals,
models = models,
horizon = 7, samples = 10)
--> same error as above
EpiSoon::evaluate_model(obs_rts = a,
obs_cases = b,
model = models[[1]],
serial_interval = EpiNow::covid_serial_intervals)
Error in check_sample2(input) : Incompatible input objects. Expected input for (y, dat): vector[1:n], matrix[1:n, 1:m]. Given input for (y, dat): vector[1:8], matrix[1:7, 1:10].
also experienced evaluate_model to be very slow once (timeout-ish) and cancelled manually.
Within evaluate_model(), the error appears in
## Score for each forecast
score_cases <-
purrr::map2_dfr(case_predictions, obs_cases,
function(sample, obs) {
dplyr::group_split(sample, forecast_date) %>%
setNames(unique(sample$forecast_date)) %>%
purrr::map_dfr(~ dplyr::select(., -forecast_date) %>%
score_case_forecast(obs), .id = "forecast_date")
}, .id = "sample")
Hmmm - can you send me your example data on slack.
Also you are lucky this isn't stack overflow :smile:
Also: this works fine
EpiSoon::evaluate_model(obs_rts = a[1:10, ],
obs_cases = b[1:10, ],
model = models[[1]],
serial_interval = EpiNow::covid_serial_intervals,
horizon = 7, samples = 10)
So the issue is that scoring expects the observed and forecast data to be of the same length and here it is not. There should be handling of this but apparently it is not working well enough.
Here is the internal check.
observations <- observations %>%
dplyr::filter(
date >= min(fit_samples$date),
date <= max(fit_samples$date)
)
To debug this effectively debug into score_forecast
. If that isn't helping debug into the overall function and step through each list item to see which one is causing the problem.
Edit: probably not true. Don't know what happened, problem disappeared again
hm maybe the culprit is this part in evaluate_model:
## Limit case predictions to observed data
case_predictions <- case_predictions %>%
dplyr::group_split(obs_sample) %>%
purrr::map(~ dplyr::select(., -obs_sample)) %>%
purrr::map2(obs_cases,
~ dplyr::filter(.x, date <= max(.y$date)))
that outputs an empty data.frame. will check what's going on.
Soooo. I found more things. The problem occurs in this part in evaluate_model():
## Score for each forecast
score_cases <-
purrr::map2_dfr(case_predictions, obs_cases,
function(sample, obs) {
dplyr::group_split(sample, forecast_date) %>%
setNames(unique(sample$forecast_date)) %>%
purrr::map_dfr(~ dplyr::select(., -forecast_date) %>%
score_forecast(obs), .id = "forecast_date")
}, .id = "sample")
I therefore made a loop that supposedly does the same thing as the above map function:
t <- dplyr::group_split(case_predictions[[1]], forecast_date) %>%
setNames(unique(case_predictions[[1]]$forecast_date))
for (i in 1:length(t)) {
t[[i]] %>%
dplyr::select(-forecast_date) %>%
score_forecast(obs_cases[[1]])
cat("success for", i, "\n")
}
The error appears for the 18th time point. While all the dates are included in the observed cases, for some weird reason in the case predictions, there are dates missing.
ds <- t[[18]]$date %>% unique ## here is a gap in the dates
obs_cases[[1]]$date %>% unique
While we could make the code run by replacing the <= in score_forecast() by %in% for filtering the dates, I think there must be a logical error in the predict_cases function that produces this gap.
Fixed with updated implementation.
obs_rts_aus <- readRDS("obs_rts_aus.rds") obs_cases_aus <- readRDS("obs_cases_aus.rds") models <- list("AR 1" = function(ss, y){bsts::AddAr(ss, y = y, lags = 1)})
forecast_eval <- compare_timeseries(obs_rts = obs_rts_aus, obs_cases = obs_cases_aus, serial_interval = EpiNow::covid_serial_intervals, models = models, horizon = 7, samples = 20)