cmu-delphi / epipredict

Tools for building predictive models in epidemiology.
https://cmu-delphi.github.io/epipredict/
Other
10 stars 10 forks source link

Unexpected error with `flatline_forecaster()` w/ `quantile_by_key`; unexpected "successes" w/ invalid cols #229

Open brookslogan opened 1 year ago

brookslogan commented 1 year ago
library(epipredict)
#> Loading required package: epiprocess
#> 
#> Attaching package: 'epiprocess'
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> Loading required package: parsnip
# trying residuals by geo_value results in error:
fcst1 <- flatline_forecaster(
  case_death_rate_subset, "death_rate",
  flatline_args_list(quantile_by_key = "geo_value")
)
#> New names:
#> • `geo_value` -> `geo_value...1`
#> • `geo_value` -> `geo_value...2`
#> Error in `dplyr::group_by()`:
#> ! Must group by variables found in `.data`.
#> ✖ Column `geo_value` is not found.
#> Backtrace:
#>      ▆
#>   1. ├─epipredict::flatline_forecaster(...)
#>   2. │ ├─... %>% dplyr::select(-time_value) at cmu-delphi-epipredict-2dd9e70/R/flatline_forecaster.R:73:2
#>   3. │ ├─base::suppressWarnings(predict(wf, new_data = latest))
#>   4. │ │ └─base::withCallingHandlers(...)
#>   5. │ ├─stats::predict(wf, new_data = latest)
#>   6. │ └─epipredict:::predict.epi_workflow(wf, new_data = latest)
#>   7. │   ├─epipredict::apply_frosting(object, components, new_data, ...) at cmu-delphi-epipredict-2dd9e70/R/epi_workflow.R:163:2
#>   8. │   └─epipredict:::apply_frosting.epi_workflow(...) at cmu-delphi-epipredict-2dd9e70/R/frosting.R:209:2
#>   9. │     ├─epipredict::slather(la, components, workflow, new_data) at cmu-delphi-epipredict-2dd9e70/R/frosting.R:265:6
#>  10. │     └─epipredict:::slather.layer_residual_quantiles(...) at cmu-delphi-epipredict-2dd9e70/R/layers.R:135:2
#>  11. │       └─dplyr::bind_cols(key_cols, r) %>% ... at cmu-delphi-epipredict-2dd9e70/R/layer_residual_quantiles.R:107:8
#>  12. ├─dplyr::select(., -time_value)
#>  13. ├─tibble::as_tibble(.)
#>  14. ├─dplyr::group_by(., !!!rlang::syms(common))
#>  15. └─dplyr:::group_by.data.frame(., !!!rlang::syms(common))
#>  16.   └─dplyr::group_by_prepare(.data, ..., .add = .add, error_call = current_env())
#>  17.     └─rlang::abort(bullets, call = error_call)
# invalid cols are accepted/ignored:
fcst2 <- flatline_forecaster(
  case_death_rate_subset, "death_rate",
  flatline_args_list(quantile_by_key = "nonexistent_column")
)
# quantile_reg + quantile_by_key is likely nonsensical, but accepted:
fcst3 <- arx_forecaster(
  case_death_rate_subset, "death_rate", c("death_rate"),
  trainer = quantile_reg(),
  args_list = arx_args_list(quantile_by_key = "geo_value")
)
#> Warning: The forecast_date is less than the most recent update date of the
#> data: forecast_date = 2021-12-31 while data is from 2022-05-31.
fcst4 <- arx_forecaster(
  case_death_rate_subset, "death_rate", c("death_rate"),
  trainer = quantile_reg(),
  args_list = arx_args_list(quantile_by_key = "nonexistent_column")
)
#> Warning: The forecast_date is less than the most recent update date of the
#> data: forecast_date = 2021-12-31 while data is from 2022-05-31.
# This successfully completes:
fcst5 <- arx_forecaster(
  case_death_rate_subset, "death_rate", c("death_rate"),
  args_list = arx_args_list(quantile_by_key = "geo_value")
)
#> Warning: Some grouping keys are not in data.frame returned by the
#> The forecast_date is less than the most recent update date of the data: forecast_date = 2021-12-31 while data is from 2022-05-31.
# But so does this:
fcst6 <- arx_forecaster(
  case_death_rate_subset, "death_rate", c("death_rate"),
  args_list = arx_args_list(quantile_by_key = "nonexistent_column")
)
#> Warning: Requested residual grouping key(s) {excess} are unavailable 
#> The forecast_date is less than the most recent update date of the data: forecast_date = 2021-12-31 while data is from 2022-05-31.

Created on 2023-08-29 with reprex v2.0.2

brookslogan commented 1 year ago

I think my usage of quantile_by_key coupled with quantile_reg() is probably nonsensical. Is there a way to do grouped modeling within workflows, or does the grouping need to be done externally?

brookslogan commented 1 year ago

Rounded out the tests above with a quantile_reg vs. default arx_forecaster.