cmu-delphi / exploration-tooling

tools for evaluating and exploring forecasters
Other
0 stars 0 forks source link

dropping rows with NA's in relevant columns only #105

Closed dsweber2 closed 5 months ago

dsweber2 commented 6 months ago

The reason our forecasters were underperforming was that when merging the chng and hhs dataset, we were removing any rows with NA's. This fixes things to only remove NA's inside a forecaster, and only for columns actually used in forecasting. For an example of the problematic data, compare

hhs_archive_alone <- tar_read("hhs_archive_data_2022") %>%
          select(geo_value, time_value, value, issue) %>%
          rename("hhs" := value) %>%
          rename(version = issue) %>%
          as_epi_archive(
            geo_type = "state",
            time_type = "day",
            compactify = TRUE
          )
max(hhs_archive_alone$as_of(as.Date("2021-11-29"))$time_value)

which is the last hhs data, while

chng_archive_alone <- tar_read("chng_archive_data_2022") %>%
          select(geo_value, time_value, value, issue) %>%
          rename("chng" := value) %>%
          rename(version = issue) %>%
          as_epi_archive(
            geo_type = "state",
            time_type = "day",
            compactify = TRUE
          )
max(chng_archive_alone$as_of(as.Date("2021-11-29"))$time_value)

which is the last chng data

chng_archive_alone$as_of(as.Date("2021-11-29")) %>% arrange(time_value) %>% tail
both <- tar_read(joined_archive_data_2022)
max(both$as_of(as.Date("2021-11-29"))$time_value)

joined matches chng:

both <- tar_read(joined_archive_data_2022)
max(both$as_of(as.Date("2021-11-29"))$time_value)

closes #93

dsweber2 commented 6 months ago

I'm going to run just the data targets in production to make sure this does the right thing too.

dsweber2 commented 6 months ago

so after renv::install("targets@1.3.2", "tarchetypes@0.7.9"); renv::snapshot(), this does seem to be working.

I also added the necessary changes to get DEBUG_MODE=true to actually get browser() working. Doesn't seem to be a way around having to run tar_make(name, callr_function=NULL).

dsweber2 commented 6 months ago

We definitely need to figure out a better way to deal with dates for slide_forecaster, though I'm not sure what exactly.