Closed dsweber2 closed 8 months ago
first, thanks for the detailed commentary!
Implement a step_rolling_mean (or some sort of step-like thing) to add grouped rolling mean columns & assign a requested role, defaulting to "predictor". (Might include a convenience feature to lag these as well.) Implement a step_rolling_sd to add grouped rolling-sds-around-rolling-mean columns & assign a requested role, defaulting to "predictor".
I do think this is an eventual goal. As with the latency adjustment the turnaround time on adding things to epipredict is longer than writing them like this. If they perform well we should do as you're saying.
cache_metadata can probably be avoided through use of dplyr_reconstruct or epi_slide.
Only reason I defined this function was because select
wasn't actually supported for epi_df
's and epi_archive
s. Glad to see there are some built-in alternatives.
re-implemented with epislide and added the stuff I marked as resolved.
Now I'm wondering; for days that are missing/NA, there are 3 main options I can see.
epi_slide
does by default, which is use them to determine the points used in the averagecomplete
with NAs or check for an nrow of 7 else output NA. Latter approach wouldn't impact the number of input&output rows and would automatically produce NAs for at least the first 6 times, similar to mutate_scattered_ts
, so that might be nice. NA seems like the safe-ish-est easy option, and gaps probably aren't that common(?) so this might be preferred?
step_gap_fill
or something like that if someone wants e.g. 0s?)epipredict
? Maybe gradually port over there after testing here? [Re-reading above --- testing the model performance first also sounds like a plan.][Seems like a lot of this we may want to proceed & test before trying to refactor/tweak. I still need to finish reading some files & try to actually sanity check more concrete things.]
For compatibility with dev10...
Ok, so I made the imputation thing a standalone issue in the icebox. For now, I'm just using default epi_slide
behavior, which fills with NA
and then removes the NA
fills in the returned result.
Would it be any easier/quicker to develop the steps if they don't have to immediately be put into epipredict? Maybe gradually port over there after testing here? [Re-reading above --- testing the model performance first also sounds like a plan.]
Definitely. This is basically ready to be tested on real data, modulo anything you catch that isn't currently enumerated in the tests, whereas I don't think we'd have this operation for another month in epipredict, between the coordination problems and dealing with the complications that come from extra framework requirements.
For now, I'm just using default epi_slide behavior, which fills with NA and then removes the NA fills in the returned result.
Warning: we want that to be the epi_slide
default, but it's not yet.
library(dplyr)
library(epiprocess)
tibble(geo_value = 1, time_value = c(1, 4,5,6), value = time_value) %>%
as_epi_df() %>%
epi_slide(mv = mean(value), before = 1)
#> An `epi_df` object, 4 x 4 with metadata:
#> * geo_type = hhs
#> * time_type = custom
#> * as_of = 2023-12-22 07:58:58
#>
#> # A tibble: 4 × 4
#> geo_value time_value value mv
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 1 1
#> 2 1 4 4 4
#> 3 1 5 5 4.5
#> 4 1 6 6 5.5
Closes #74. This adds a new forecaster that creates a 7 day average column and a 28 day moving standard deviation column, and uses those as predictors. @brookslogan would like to hear your feedback about implementation if you've got time.