Closed DavisVaughan closed 2 years ago
Hi @DavisVaughan,
Thanks for the heads up on the recipe changes, for explaining the change, and for identifying the affected code in MachineShop. I really appreciate you taking the time to do that and should have some time starting this Wednesday to look into the issue.
I was just about to release a new version of the package, so this is good timing, and will aim to get it working with the upcoming recipes release soon.
Regards.
I don't seem to be getting an error when using MachineShop with the latest development version (0.1.15.9000) of recipes and my R installation. Do you see any of my package versions below that are different from yours?
> library(MachineShop)
> example(ModeledInput)
MdldIn> ## Modeled model frame
MdldIn> mod_mf <- ModeledInput(sale_amount ~ ., data = ICHomes, model = GLMModel)
MdldIn> fit(mod_mf)
Call: stats::glm(formula = formula, family = family, data = data, weights = weights,
control = control)
Coefficients:
(Intercept) sale_year sale_month
-1.742e+07 2.513e+03 -7.918e+02
built styleCondo construction1 Story Brick
3.029e+02 -1.774e+04 -1.176e+04
construction1 Story Condo construction1 Story Frame construction2 Story Brick
-6.658e+04 -6.416e+04 1.131e+05
construction2 Story Condo construction2 Story Frame constructionSplit Foyer Frame
9.514e+03 2.323e+04 -5.714e+04
constructionSplit Level Frame base_size add_size
-3.302e+04 1.635e+02 1.433e+02
garage1_size garage2_size lot_size
5.151e+01 6.838e+01 1.232e+00
bedrooms basementYes acYes
-1.557e+03 2.387e+04 -2.600e+03
atticYes lon lat
4.763e+04 5.430e+03 2.949e+05
Degrees of Freedom: 752 Total (i.e. Null); 729 Residual
Null Deviance: 6.323e+12
Residual Deviance: 1.76e+12 AIC: 18430
MdldIn> ## Modeled recipe
MdldIn> library(recipes)
MdldIn> rec <- recipe(sale_amount ~ ., data = ICHomes)
MdldIn> mod_rec <- ModeledInput(rec, model = GLMModel)
MdldIn> fit(mod_rec)
Call: stats::glm(formula = formula, family = family, data = data, weights = weights,
control = control)
Coefficients:
(Intercept) sale_year sale_month
-1.742e+07 2.513e+03 -7.918e+02
built styleCondo construction1 Story Brick
3.029e+02 -1.774e+04 -1.176e+04
construction1 Story Condo construction1 Story Frame construction2 Story Brick
-6.658e+04 -6.416e+04 1.131e+05
construction2 Story Condo construction2 Story Frame constructionSplit Foyer Frame
9.514e+03 2.323e+04 -5.714e+04
constructionSplit Level Frame base_size add_size
-3.302e+04 1.635e+02 1.433e+02
garage1_size garage2_size lot_size
5.151e+01 6.838e+01 1.232e+00
bedrooms basementYes acYes
-1.557e+03 2.387e+04 -2.600e+03
atticYes lon lat
4.763e+04 5.430e+03 2.949e+05
Degrees of Freedom: 752 Total (i.e. Null); 729 Residual
Null Deviance: 6.323e+12
Residual Deviance: 1.76e+12 AIC: 18430
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] recipes_0.1.15.9000 dplyr_1.0.2 MachineShop_2.5.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.5 lubridate_1.7.9 mvtnorm_1.1-1 lattice_0.20-41 tidyr_1.1.2
[6] listenv_0.8.0 prettyunits_1.1.1 class_7.3-17 zoo_1.8-8 assertthat_0.2.1
[11] digest_0.6.27 ipred_0.9-9 foreach_1.5.1 parallelly_1.21.0 R6_2.5.0
[16] stats4_4.0.3 ggplot2_3.3.2 pillar_1.4.6 rlang_0.4.8 progress_1.2.2
[21] multcomp_1.4-14 rstudioapi_0.12 DiceDesign_1.8-1 furrr_0.2.1 kernlab_0.9-29
[26] rpart_4.1-15 Matrix_1.2-18 splines_4.0.3 gower_0.2.2 munsell_0.5.0
[31] compiler_4.0.3 pkgconfig_2.0.3 libcoin_1.0-6 globals_0.13.1 dials_0.0.9
[36] nnet_7.3-14 tidyselect_1.1.0 tibble_3.0.4 prodlim_2019.11.13 coin_1.3-1
[41] codetools_0.2-16 matrixStats_0.57.0 fansi_0.4.1 future_1.20.1 crayon_1.3.4
[46] withr_2.3.0 MASS_7.3-53 grid_4.0.3 polspline_1.1.19 gtable_0.3.0
[51] lifecycle_0.2.0 magrittr_1.5 scales_1.1.1 cli_2.1.0 rsample_0.0.8
[56] party_1.3-5 strucchange_1.5-2 timeDate_3043.102 ellipsis_0.3.1 generics_0.1.0
[61] vctrs_0.3.4 sandwich_3.0-0 TH.data_1.0-10 lava_1.6.8.1 iterators_1.0.13
[66] tools_4.0.3 glue_1.4.2 purrr_0.3.4 hms_0.5.3 abind_1.4-5
[71] parallel_4.0.3 survival_3.2-7 colorspace_1.4-1 modeltools_0.2-23
I think this PR may have actually fixed the issue, since now fully_trained()
will return FALSE
when there are zero steps and prep()
hasn't been run
https://github.com/tidymodels/recipes/pull/600
Nevertheless, I think I would still encourage always running through prep()
unconditionally
Thanks. I'll keep your recommendation about prep()
in mind as I think about the code.
I just ran reverse dependencies for recipes (which might go to CRAN next week) and had an error with the help example in ?ModeledInput
.
The error seems to occur in .fit.MLModel
; when juice()
is run on the data, there are no data points available since prep.ModelRecipe()
calls recipes::prep(retain = FALSE)
.
We fixed a bug (tidymodels/recipes#652) where the training data were kept (to some extent) when retain = FALSE
. I think that this is the change. I would suggest using retain = TRUE
so that you can use juice()
without an error.
Thanks for the heads-up on the retain
change in prep()
. I've uploaded a fix to the master branch of MachineShop for its affected function and will include it in the next CRAN release in sync with recipes.
Hi there! We've made a few changes in recipes to fully support modern tidyselect. You can see that PR here https://github.com/tidymodels/recipes/pull/595
In the process, we slightly tweaked
bake()
to more directly use an element of a prepped recipe called$last_term_info
. This element is only added to the recipe atprep()
time, and is something we nowleft_join()
against inbake()
.It seems that
prep.ModelRecipe()
can skip the call toprep()
if the recipe looks fully trained. https://github.com/brian-j-smith/MachineShop/blob/8777d6b19d453d1d0d9e96d76ed72319ae49cec1/R/ModelRecipe.R#L76-L85For better or worse, a recipe looks fully trained if there are 0 steps, which means that
prep.recipe()
won't be called in those cases - meaning that$last_term_info
won't get added onto the recipe. This causes this failure in our revdeps https://github.com/tidymodels/recipes/pull/595#issuecomment-722587579Is there any way you can just let
prep.recipe()
get called unconditionally inprep.ModelRecipe()
?As an FYI, we plan to submit recipes to CRAN fairly soon.