Closed dajmcdon closed 7 months ago
Interesting, a little surprised I haven't actually run into this problem. If we're not doing grouping by geo_value
by default, when would we want to do this? Is this a problem that comes up naturally for multi-aheads, e.g. smooth_quantile_regression
?
So I tried the test without the change and it seems to have passed; doing a browser()
, it looks like grab_residuals
only returns a .resid
column. My guess is that you had a more complicated situation where grab_residuals
actually had too many columns that made this unit test from?
Here's the failure:
library(epipredict)
#> Loading required package: epiprocess
#>
#> Attaching package: 'epiprocess'
#> The following object is masked from 'package:stats':
#>
#> filter
#> Loading required package: parsnip
flat1 <- flatline_forecaster(case_death_rate_subset, "death_rate")
flat2 <- flatline_forecaster(
case_death_rate_subset, "death_rate",
args_list = flatline_args_list(quantile_by_key = "geo_value")
)
#> New names:
#> • `geo_value` -> `geo_value...1`
#> • `geo_value` -> `geo_value...2`
#> Error in `dplyr::group_by()`:
#> ! Must group by variables found in `.data`.
#> ✖ Column `geo_value` is not found.
Created on 2024-02-20 with reprex v2.1.0
And on rebuilding with this fix:
library(epipredict)
#> Loading required package: epiprocess
#>
#> Attaching package: 'epiprocess'
#> The following object is masked from 'package:stats':
#>
#> filter
#> Loading required package: parsnip
flat1 <- flatline_forecaster(case_death_rate_subset, "death_rate")
flat2 <- flatline_forecaster(
case_death_rate_subset, "death_rate",
args_list = flatline_args_list(quantile_by_key = "geo_value")
)
Created on 2024-02-20 with reprex v2.1.0
It's because of the grab_residuals()
function returning a tibble with 2 columns (geo_value
and .resid
) for the flatline engine. This doesn't (necessarily) happen with other engines.
That seems simple enough to include as the unit test maybe?
I tried the straightforward wf <- epi_workflow(r, parsnip::linear_reg() %>% parsnip::set_engine("flatline")) %>% fit(jhu)
to get the test to fail on the old version, but that gives the confusing error of
Error in `dplyr::arrange()`:
ℹ In argument: `..1 = time_value`.
Caused by error:
! object 'time_value' not found
oh, I pushed the change for the styler, which seems to have been the only thing that was breaking the CI
@dsweber2 Am I good to merge this?
oh, I suppose I should've said that in addition to approving. Would you prefer I just merge if I approve when I'm tagged?
But yes, lgtm!
Ah, no problem. I think better for the PR creator to merge once satisfied, but I lost track of whether we were both satisfied!
Checklist
Please:
DESCRIPTION
andNEWS.md
. Always increment the patch version number (the third number), unless you are making a release PR from dev to main, in which case increment the minor version number (the second number).Change explanations for reviewer
If
grab_residuals()
returns a data frame with some keys, then theby_key
argument would cause an error. This is be cause, for example, ifgeo_value
was a key (it always is), and ther
object had the columnsc("geo_value", ".resid")
, thenbind_cols(key_cols, r)
renames the two geo_value columns to be distinct, and subsequent group_by operations failed.Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch