cmu-delphi / epipredict

Tools for building predictive models in epidemiology.
https://cmu-delphi.github.io/epipredict/
Other
8 stars 9 forks source link

Djm/resids hotfix #294

Closed dajmcdon closed 7 months ago

dajmcdon commented 7 months ago

Checklist

Please:

Change explanations for reviewer

If grab_residuals() returns a data frame with some keys, then the by_key argument would cause an error. This is be cause, for example, if geo_value was a key (it always is), and the r object had the columns c("geo_value", ".resid"), then bind_cols(key_cols, r) renames the two geo_value columns to be distinct, and subsequent group_by operations failed.

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

dsweber2 commented 7 months ago

Interesting, a little surprised I haven't actually run into this problem. If we're not doing grouping by geo_value by default, when would we want to do this? Is this a problem that comes up naturally for multi-aheads, e.g. smooth_quantile_regression?

So I tried the test without the change and it seems to have passed; doing a browser(), it looks like grab_residuals only returns a .resid column. My guess is that you had a more complicated situation where grab_residuals actually had too many columns that made this unit test from?

dajmcdon commented 7 months ago

Here's the failure:

library(epipredict)
#> Loading required package: epiprocess
#> 
#> Attaching package: 'epiprocess'
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> Loading required package: parsnip
flat1 <- flatline_forecaster(case_death_rate_subset, "death_rate")
flat2 <- flatline_forecaster(
  case_death_rate_subset, "death_rate",
  args_list = flatline_args_list(quantile_by_key = "geo_value")
)
#> New names:
#> • `geo_value` -> `geo_value...1`
#> • `geo_value` -> `geo_value...2`
#> Error in `dplyr::group_by()`:
#> ! Must group by variables found in `.data`.
#> ✖ Column `geo_value` is not found.

Created on 2024-02-20 with reprex v2.1.0

And on rebuilding with this fix:

library(epipredict)
#> Loading required package: epiprocess
#> 
#> Attaching package: 'epiprocess'
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> Loading required package: parsnip
flat1 <- flatline_forecaster(case_death_rate_subset, "death_rate")
flat2 <- flatline_forecaster(
  case_death_rate_subset, "death_rate",
  args_list = flatline_args_list(quantile_by_key = "geo_value")
)

Created on 2024-02-20 with reprex v2.1.0

dajmcdon commented 7 months ago

It's because of the grab_residuals() function returning a tibble with 2 columns (geo_value and .resid) for the flatline engine. This doesn't (necessarily) happen with other engines.

dsweber2 commented 7 months ago

That seems simple enough to include as the unit test maybe?

I tried the straightforward wf <- epi_workflow(r, parsnip::linear_reg() %>% parsnip::set_engine("flatline")) %>% fit(jhu) to get the test to fail on the old version, but that gives the confusing error of

Error in `dplyr::arrange()`:
ℹ In argument: `..1 = time_value`.
Caused by error:
! object 'time_value' not found
dsweber2 commented 7 months ago

oh, I pushed the change for the styler, which seems to have been the only thing that was breaking the CI

dajmcdon commented 7 months ago

@dsweber2 Am I good to merge this?

dsweber2 commented 7 months ago

oh, I suppose I should've said that in addition to approving. Would you prefer I just merge if I approve when I'm tagged?

But yes, lgtm!

dajmcdon commented 7 months ago

Ah, no problem. I think better for the PR creator to merge once satisfied, but I lost track of whether we were both satisfied!