amices / mice

Multivariate Imputation by Chained Equations
https://amices.org/mice/
GNU General Public License v2.0
445 stars 108 forks source link

Use dot to include all the predictors in Logistic regression model after imputation #265

Open tiantiy opened 4 years ago

tiantiy commented 4 years ago

Hi there,

I have use MICE to do the imputation. As the dataset we use has many variables, I try to use a dot in the glm function to include all the predictors instead of typing all the names of the predictors. However, an error message occurs.

Here I use the nhanes dataset as an example

require(mice, warn.conflicts = FALSE)
set.seed(123)
nhanes$hyp <- as.factor(nhanes$hyp)

imputed_data <- mice::mice(nhanes, m = 5, method = "pmm", 
                           maxit = 10, seed = 12345, print = FALSE)
imputed_model <- with(imputed_data, 
                      glm(hyp ~ ., family = binomial(link = 'logit')))

Error in terms.formula(formula, data = data) : '.' in formula and no 'data' argument

Is there another quick way to include all the predictors in the dataset without typing out the names in the analysis model? Many thanks for any assistance.

stefvanbuuren commented 4 years ago

Thanks for bringing this to my attention.

No quick fix. I wrote with.mids() many years ago, hoping that it would work most of the time. It fared well, but here you report a clear limitation.

I would need to rethink the design and see what is possible with modern tools and recent texts, especially Hadley's Advanced book. If there is somebody out there with a good understanding of the R evaluation model, please feel free to drop by. It might well be that the solution is a lot simpler than the current with.mids().

prockenschaub commented 4 years ago

Another quick option for the case of glm would be to mget the variables. with.mids evaluates the glm expression "within" the each complete data.frame, i.e. the columns of the data.frame are accessible in the evaluation environment as if they were variables. If you use data = mget(names(nhanes)) in your glm call, mget collects those variables again and passes them to glm's data argument as a list, allowing it to work with ..

require(mice, warn.conflicts = FALSE)
#> Loading required package: mice
set.seed(123)
nhanes$hyp <- as.factor(nhanes$hyp)

imputed_data <- mice::mice(nhanes, m = 5, method = "pmm", 
                           maxit = 10, seed = 12345, print = FALSE)
imputed_model <- with(imputed_data, 
                      glm(hyp ~ ., family = binomial(link = 'logit'), data = mget(names(nhanes))))
imputed_model$analyses[[1]]
#> 
#> Call:  glm(formula = hyp ~ ., family = binomial(link = "logit"), data = mget(names(nhanes)))
#> 
#> Coefficients:
#> (Intercept)          age          bmi          chl  
#>   -31.45358      5.24968      0.89800     -0.02669  
#> 
#> Degrees of Freedom: 24 Total (i.e. Null);  21 Residual
#> Null Deviance:       21.98 
#> Residual Deviance: 13.06     AIC: 21.06
stefvanbuuren commented 4 years ago

Commit 4634094 simplifies with.mids() by calling eval_tidy() on a quosure. While this is a compact replacement for multiple lines of old code, it still gives the error '.' in formula and no 'data' argument. Now noted in the documentation.

stefvanbuuren commented 3 years ago

Because of downstream issues, mice 3.12.2 reverts to the previous version of with.mids() that relies on base::eval().