Currently if data is already prepped, machine_learn will reprep already prepped data and replace the original recipe that is in the prepped object. This is confusing because the recipe object listed as an attribute of the returned model object will be a new recipe object. This error message should notify the user that they should not use machine_learn for objects that are already prepped.
library(healthcareai)
#> healthcareai version 2.2.0
#> Please visit https://docs.healthcare.ai for full documentation and vignettes. Join the community at https://healthcare-ai.slack.com
library(tidyverse)
prepped_d <- prep_data(pima_diabetes, patient_id, outcome = diabetes)
#> Training new data prep recipe...
m <- machine_learn(prepped_d, patient_id, outcome = diabetes, models = "rf",
tune = FALSE)
#> Training new data prep recipe...
#> Removing the following 2 near-zero variance column(s). If you don't want to remove them, call prep_data with remove_near_zero_variance as a smaller numeric or FALSE.
#> weight_class_other and weight_class_missing
#> Variable(s) ignored in prep_data won't be used to tune models: patient_id
#>
#> diabetes looks categorical, so training classification algorithms.
#>
#> After data processing, models are being trained on 10 features with 768 observations.
#> Based on n_folds = 5 and hyperparameter settings, the following number of models will be trained: 5 rf's
#> Training at fixed values: Random Forest
#>
#> *** Models successfully trained. The model object contains the training data minus ignored ID columns. ***
#> *** If there was PHI in training data, normal PHI protocols apply to the model object. ***
testthat::expect_equal(attr(m, "recipe"), attr(prepped_d, "recipe"))
#> Error: attr(m, "recipe") not equal to attr(prepped_d, "recipe").
#> Attributes: < Component "factor_levels": Names: 1 string mismatch >
#> Attributes: < Component "factor_levels": Length mismatch: comparison on first 1 components >
#> Attributes: < Component "factor_levels": Component 1: Names: 2 string mismatches >
#> Attributes: < Component "factor_levels": Component 1: Attributes: < Component "dim": Mean relative difference: 2 > >
#> Attributes: < Component "factor_levels": Component 1: Attributes: < Component "dimnames": Component "": 2 string mismatches > >
#> Attributes: < Component "factor_levels": Component 1: Numeric: lengths (2, 6) differ >
#> Attributes: < Component "missingness": Names: 8 string mismatches >
#> Attributes: < Component "missingness": Numeric: lengths (14, 10) differ >
#> Component "var_info": Different number of rows
#> ...
Currently if data is already prepped,
machine_learn
will reprep already prepped data and replace the original recipe that is in the prepped object. This is confusing because therecipe
object listed as an attribute of the returned model object will be a new recipe object. This error message should notify the user that they should not usemachine_learn
for objects that are already prepped.Created on 2018-09-06 by the reprex package (v0.2.0).