cmu-delphi / epipredict

Tools for building predictive models in epidemiology.
https://cmu-delphi.github.io/epipredict/
Other
8 stars 8 forks source link

`get_test_data` returning `NA`'s if there are `NA`'s in the most recent data #267

Open dsweber2 opened 8 months ago

dsweber2 commented 8 months ago

This seems like a bug. An example of what I mean:

jhu <- filter(
  case_death_rate_subset,
  time_value >= "2021-06-04",
  time_value <= "2021-12-31",
  geo_value %in% c("ca", "fl", "tx", "ny", "nj")
)
r <- epi_recipe(counts_subset) %>%
  add_role(geo_value_factor, new_role = "predictor") %>%
  step_dummy(geo_value_factor) %>%
  ## Occasionally, data reporting errors / corrections result in negative
  ## cases / deaths
  step_mutate(cases = pmax(cases, 0), deaths = pmax(deaths, 0)) %>%
  step_epi_lag(cases, deaths, lag = c(0, 7)) %>%
  step_epi_ahead(deaths, ahead = 7, role = "outcome") %>%
  step_epi_naomit() 
geo_values <-jhu$geo_value %>% unique()
one_day_nas <- tibble(
  geo_value = geo_values,
  time_value = as.Date("2022-01-01"),
  case_rate = NA,
  death_rate = runif(length(geo_values))
)
second_day_nas <- one_day_nas %>%
  mutate(time_value = as.Date("2022-01-02"))
jhu_nad <- jhu %>%
  as_tibble() %>%
  bind_rows(one_day_nas, second_day_nas) %>%
  as_epi_df()
attributes(jhu_nad)$metadata$as_of <- max(jhu_nad$time_value) + 3
get_test_data(r, jhu_nad)

The example workflow is unfortunately buried in the guts of exploration tooling; arx_forecastersort of does do the right thing, though it thinks the last day with data is the last day with NA data.

dsweber2 commented 7 months ago

kind of related to #106

dsweber2 commented 1 month ago

@dshemetov how would this interact with the work you've been doing with forecast and get_test_data? Should we hold off on this until that's done?

dshemetov commented 1 month ago

I'm hoping that the work there ends up resolving this, so let's just make sure to follow up on this after that's done.