cmu-delphi / epiprocess

Tools for basic signal processing in epidemiology
https://cmu-delphi.github.io/epiprocess/
Other
13 stars 8 forks source link

Consider moving `epi_shift()` from `epipredict` here, exporting, maybe generalizing #457

Open brookslogan opened 6 months ago

brookslogan commented 6 months ago

This may make it easier to work with lags (correctly) with fable or other non-epipredict forecasters. I'm not sure if fable AR works properly if you just have e.g. lag(exog, 7L) in the formula if there are gaps.

@dajmcdon does this sound like a good idea?

dajmcdon commented 6 months ago

Certainly plausible. Are you thinking to export as a method? Maybe we should adjust the name in that case (with the larger goal of eventually removing the epi_ prefixes throughout the package)?

On your question, fable does not handle gaps:

library(fable)
library(tsibble)
x <- rnorm(100)
x[15] <- NA
dat <- tsibble(x = x, t = 1:100, index = t)
model(dat, AR(x ~ order(3)))
#> Warning: 1 error encountered for AR(x ~ order(3))
#> [1] NA/NaN/Inf in foreign function call (arg 1)
#> # A mable: 1 x 1
#>   `AR(x ~ order(3))`
#>              <model>
#> 1       <NULL model>
model(dat[-15,], AR(x ~ order(3)))
#> Warning: 1 error encountered for AR(x ~ order(3))
#> [1] .data contains implicit gaps in time. You should check your data and convert implicit gaps into explicit missing values using `tsibble::fill_gaps()` if required.
#> # A mable: 1 x 1
#>   `AR(x ~ order(3))`
#>              <model>
#> 1       <NULL model>

Under the hood, it basically just wraps the stats::ar.ols().

brookslogan commented 6 months ago

Created on 2024-06-05 with reprex v2.0.2

brookslogan commented 6 months ago

I realize I didn't test the right thing above. The original issue I was thinking of was raised by @richardzhuang0412 , where you want to use lags of an exogeneous variable as predictors.

dajmcdon commented 6 months ago

Maybe I'm confused by "handle". That warning above is fatal: it doesn't fit anything and returns a NULL model object. This is the case for both explicit and implicit gaps. So implicit, it tells you to fill it, and explicit, it fails at fit time due to the NA. Is that OK for this purpose?

brookslogan commented 5 months ago

Yep, that all sounds fine. "Handle" = generate error is OK. So the use case for epi_shift() isn't fable after all. It'd probably be when we want to use some non-fable non-epipredict thing and want to "properly" take lags accounting for the time_value [when it's not already done for us as in fable & epipredict]. (Of course, [to be deployable] this thing needs to be able to accept NAs in the training data and accept/avoid them in the test data. I'd imagine non-epipredict-based but still regression-based approaches would be one example.)