Provide better error/recovery when test lags don't exist ("zero-length inputs [...]") #333

Open brookslogan opened 1 month ago

brookslogan commented 1 month ago

Same message as #128. Looks like this is due to having a gap of data availability between seasons, and the requested forecast date having data for some lags but not all. Ideally we would give a good message in test lag preparation, rather than balk in residual quantiles.

flusurv_analysis_issue <- as.Date("2019-08-01") %>%
  MMWRweek::MMWRweek() %>%
  {.$MMWRyear * 100L + .$MMWRweek}

flusurv_issue_data <-
    locations = "network_all",
    issues = epirange(123401, flusurv_analysis_issue)
flusurv_archive <- flusurv_issue_data %>%
  select(geo_value = location,
         time_value = epiweek,
         version = release_date,
         starts_with("rate_")) %>%
  as_epi_archive(compactify = TRUE)

archive <- flusurv_archive

forecast_dates <- seq(min(archive$DT$version) + 120L, archive$versions_end,
                      by = "6 weeks")

horizons <- c(0, 7, 14, 21, 28) # relative to forecast_date

example_forecaster <- function(snapshot_edf, forecast_date) {
  shared_reporting_latency <- as.integer(forecast_date - max(snapshot_edf$time_value))
  horizons %>%
    map(function(horizon) {
      snapshot_edf %>%
          outcome = "rate_overall",
          predictors = "rate_overall",
          args_list = arx_args_list(
            # (this is incomplete; latency often varies signficantly by covariate and can't be ignored, so we also need lag adjustment.)
            ahead = shared_reporting_latency + horizon,
            ## ahead = horizon,
            quantile_levels = c(0.1, 0.5, 0.9)
          )) %>%
        .$predictions %>%
        mutate(forecast_date = forecast_date,
               target_date = forecast_date + horizon)
    }) %>%

latest_edf <- archive %>% epix_as_of(.$versions_end)

unfaithful_forecasts <- latest_edf %>%
  # pretend we get observations about today, on today, with no revisions
  mutate(version = time_value) %>%
  as_epi_archive(versions_end = max(forecast_dates)) %>%
    # pretend version releases are on forecast dates
    ref_time_values = forecast_dates,
    before = 365000L, # 1000-year time window --> don't filter out any `time_value`s
    ~ example_forecaster(.x, .ref_time_value),
    names_sep = NULL
  ) %>%
brookslogan commented 1 month ago

The difference between this and #332 is that this is dealing with a lagset being bad for test data, while #332 is about a shiftset being bad for training data.

dajmcdon commented 2 weeks ago

Can you make a simple example without the slide?