cmu-delphi / epipredict

Tools for building predictive models in epidemiology.
https://cmu-delphi.github.io/epipredict/
Other
8 stars 9 forks source link

`fit.epi_workflow` silently drops `epi_workflow` class when built or fit on a non-`epi_df` or non-`epi_recipe` in most cases #363

Open brookslogan opened 2 months ago

brookslogan commented 2 months ago

I assume this will cause issues down the line when users are unexpectedly being missing out on slathering, or even earlier in the operation any step_epi*.

library(recipes)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step
library(epipredict)
#> Loading required package: epiprocess
#> Registered S3 method overwritten by 'tsibble':
#>   method               from 
#>   as_tibble.grouped_df dplyr
#> 
#> Attaching package: 'epiprocess'
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> Loading required package: parsnip
#> Registered S3 method overwritten by 'epipredict':
#>   method            from   
#>   print.step_naomit recipes
library(testthat)
#> 
#> Attaching package: 'testthat'
#> The following object is masked from 'package:dplyr':
#> 
#>     matches

tbl <- tibble::tibble(
  geo_value = 1,
  time_value = 1:100,
  x = 1:100,
  y = x + rnorm(100L)
)
edf <- as_epi_df(tbl)

rec_tbl <- recipe(y ~ x, data = tbl)
rec_edf <- recipe(y ~ x, data = edf)
erec_tbl <- epi_recipe(y ~ x, data = tbl) # warning, good; not testing further; why not error though?
#> Warning: epi_recipe has been called with a non-epi_df object, returning a
#> regular recipe. Various step_epi_* functions will not work.
erec_edf <- epi_recipe(y ~ x, data = edf)

ewf_rec_tbl <- epi_workflow(rec_tbl, linear_reg())
ewf_rec_edf <- epi_workflow(rec_edf, linear_reg())
ewf_erec_edf <- epi_workflow(erec_edf, linear_reg())

# above are all epi_workflows:

expect_s3_class(ewf_rec_tbl, "epi_workflow")
expect_s3_class(ewf_rec_edf, "epi_workflow")
expect_s3_class(ewf_erec_edf, "epi_workflow")

# but fitting drops the class or generates errors in many cases:

ewf_rec_tbl %>% fit(tbl) %>% class()
#> [1] "workflow"
ewf_rec_tbl %>% fit(edf) %>% class()
#> [1] "workflow"
ewf_rec_edf %>% fit(tbl) %>% class()
#> [1] "workflow"
ewf_rec_edf %>% fit(edf) %>% class()
#> [1] "workflow"
ewf_erec_edf %>% fit(tbl) %>% class()
#> Error in if (new_meta != old_meta) {: argument is of length zero
ewf_erec_edf %>% fit(edf) %>% class()
#> [1] "epi_workflow" "workflow"

Created on 2024-07-18 with reprex v2.1.1

brookslogan commented 2 months ago

I assume the "correct" response here is to either output an epi_workflow or a clear error message. Not sure which cases should be which.

brookslogan commented 1 month ago

I expect that this will cause some sort of breakage if using geo-grouped epi_slide to do naive backtesting, as .x there won't be an epi_df. Geo-grouped epix_slides should be okay, as they use keep = TRUE so they should feed in epi_dfs, but probably have some later annoyances from duplicate geo columns... I'm currently trying to remove the duplicate column complaining in epiprocess.