cmu-delphi / epipredict

Tools for building predictive models in epidemiology.
https://cmu-delphi.github.io/epipredict/
Other
8 stars 8 forks source link

Eliminate/explicate differences in training windowing between flatline and arx forecasters #321

Open brookslogan opened 2 months ago

brookslogan commented 2 months ago

290 highlighted that training window sizes similar to the ahead value can trip up the flatline forecaster. But this also indicates that the flatline forecaster is not using anywhere near n_training instances per epikey if ahead is within an order of magnitude of n_training. This is not the case for arx_forecaster:

library(epipredict)
#> Loading required package: epiprocess
#> 
#> Attaching package: 'epiprocess'
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> Loading required package: parsnip
trace(slather, quote({
  if (inherits(object, "layer_residual_quantiles")) {
    trace(dplyr::summarize, quote({
      cat("Number of non-NA residuals:\n")
      print(.data %>% tidyr::drop_na(.resid) %>% nrow())
    }))
  }
}), quote(untrace(dplyr::summarize)))
#> Tracing function "slather" in package "epipredict"
#> [1] "slather"
case_death_rate_subset %>% flatline_forecaster("case_rate", flatline_args_list(ahead = 28L, n_training = 29L))
#> [...]
#> Number of non-NA residuals:
#> [1] 56
#> [...]
case_death_rate_subset %>% arx_forecaster("case_rate", "case_rate", args_list = arx_args_list(ahead = 28L, n_training = 29L))
#> [...]
#> Number of non-NA residuals:
#> [1] 1624
#> [...]

Created on 2024-04-19 with reprex v2.0.2

However, ?flatline_args_list doesn't explicate this

n_training: Integer. An upper limit for the number of rows per key that
          are used for training (in the time unit of the 'epi_df').

and the message from slather.layer_residual_quantiles when output residuals are NA is something specific to flatline forecaster (and off by one for flatline_forecaster):

! Residual quantiles could not be calculated due to missing residuals.
ℹ This may be due to `n_train` < `ahead` in your <epi_recipe>.

Approach 1: eliminate these differences. Make n_training make sense for flatline_forecaster by using the same NA omission pre training window approach as arx_forecaster. Remove the mention of the inequality above in the layer_residual_quantiles error message since it won't be an issue anymore.

Approach 2: explain the difference in ?flatline_args_list, and mention n_train --> <= <-- ahead is an issue --> for flatline_forecaster <-- in the residual quantiles error message.