Djm/resids hotfix - Githubissues

dajmcdon commented 7 months ago

Checklist

Please:

[x] Make sure this PR is against "dev", not "main".
[x] Request a review from one of the current epipredict main reviewers: dajmcdon.
[x] Makes sure to bump the version number in DESCRIPTION and NEWS.md. Always increment the patch version number (the third number), unless you are making a release PR from dev to main, in which case increment the minor version number (the second number).
[x] Describe changes made in NEWS.md, making sure breaking changes (backwards-incompatible changes to the documented interface) are noted. Collect the changes under the next release number (e.g. if you are on 0.7.2, then write your changes under the 0.8 heading).

Change explanations for reviewer

If grab_residuals() returns a data frame with some keys, then the by_key argument would cause an error. This is be cause, for example, if geo_value was a key (it always is), and the r object had the columns c("geo_value", ".resid"), then bind_cols(key_cols, r) renames the two geo_value columns to be distinct, and subsequent group_by operations failed.

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

Resolves #{issue number}

dsweber2 commented 7 months ago

Interesting, a little surprised I haven't actually run into this problem. If we're not doing grouping by geo_value by default, when would we want to do this? Is this a problem that comes up naturally for multi-aheads, e.g. smooth_quantile_regression?

So I tried the test without the change and it seems to have passed; doing a browser(), it looks like grab_residuals only returns a .resid column. My guess is that you had a more complicated situation where grab_residuals actually had too many columns that made this unit test from?

dajmcdon commented 7 months ago

Here's the failure:

library(epipredict)
#> Loading required package: epiprocess
#> 
#> Attaching package: 'epiprocess'
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> Loading required package: parsnip
flat1 <- flatline_forecaster(case_death_rate_subset, "death_rate")
flat2 <- flatline_forecaster(
  case_death_rate_subset, "death_rate",
  args_list = flatline_args_list(quantile_by_key = "geo_value")
)
#> New names:
#> • `geo_value` -> `geo_value...1`
#> • `geo_value` -> `geo_value...2`
#> Error in `dplyr::group_by()`:
#> ! Must group by variables found in `.data`.
#> ✖ Column `geo_value` is not found.

^{Created on 2024-02-20 with reprex v2.1.0}

And on rebuilding with this fix:

library(epipredict)
#> Loading required package: epiprocess
#> 
#> Attaching package: 'epiprocess'
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> Loading required package: parsnip
flat1 <- flatline_forecaster(case_death_rate_subset, "death_rate")
flat2 <- flatline_forecaster(
  case_death_rate_subset, "death_rate",
  args_list = flatline_args_list(quantile_by_key = "geo_value")
)

^{Created on 2024-02-20 with reprex v2.1.0}

dajmcdon commented 7 months ago

It's because of the grab_residuals() function returning a tibble with 2 columns (geo_value and .resid) for the flatline engine. This doesn't (necessarily) happen with other engines.

dsweber2 commented 7 months ago

That seems simple enough to include as the unit test maybe?

I tried the straightforward wf <- epi_workflow(r, parsnip::linear_reg() %>% parsnip::set_engine("flatline")) %>% fit(jhu) to get the test to fail on the old version, but that gives the confusing error of

Error in `dplyr::arrange()`:
ℹ In argument: `..1 = time_value`.
Caused by error:
! object 'time_value' not found

dsweber2 commented 7 months ago

oh, I pushed the change for the styler, which seems to have been the only thing that was breaking the CI

dajmcdon commented 7 months ago

@dsweber2 Am I good to merge this?

dsweber2 commented 7 months ago

oh, I suppose I should've said that in addition to approving. Would you prefer I just merge if I approve when I'm tagged?

But yes, lgtm!

dajmcdon commented 7 months ago

Ah, no problem. I think better for the PR creator to merge once satisfied, but I lost track of whether we were both satisfied!

cmu-delphi / epipredict

Djm/resids hotfix #294

Checklist

Change explanations for reviewer

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch