Closed jsepin closed 3 months ago
The difference comes from the pre-processing done by the mice()
function. In particular, the remove.lindep()
function is run by default in mice to try to stabilize the imputation models. When trying to impute the data you are generating, many variables are dropped out of the imputation models before the mice.impute.
univariate imputation methods are called. You can check this by running the following:
# Set seed
set.seed(20240731)
# Generate data using your dgp function
data <- dgp()
# Run mice
mice_m <- mice(
data = data$amp,
m = 2, maxit = 2,
method = "lasso.select.norm",
ls.meth = "ridge",
printFlag = FALSE
)
# Check logged events
mice_m$loggedEvents
In the logged events, you should see that many variables have been "kicked out," meaning they never made it to the mice.impute.lasso.select.norm()
function. With the wrapper you wrote, the linear dependency checks are not performed, and the lasso is the only way you handle variable selection. To avoid this pre-processing in mice()
you can set the eps
argument in the mice()
call to 0. In your code, change the eval statement to:
eval(parse(
text = paste0(
"mice_m <- ", method_fact$wrapper[i],
"(data = data$amp, m = m, maxit = maxit, method = '",
method_fact$method[i],
"',ls.meth = 'ridge', printFlag = FALSE, eps = 0)"
)
))
I ran your code with this change, and the resulting plot is the following:
There are no more differences between your wrapper and the standard mice()
.
I think I should clarify that eps = 0
is recommended when running lasso-based imputation methods. I will make a pull request to update the documentation soon.
Let me know if this solves your issue entirely or if there is anything else I can help with.
@EdoardoCostantini thanks. No really a bug or problem. I will convert the issue to a Discussion.
Describe the bug
I wanted to conduct a simulation study with one missing variable using
mice.impute.lasso.select.norm
. However, this method consistently failed to select the correct variables when used within themice
, despite working correctly when used directly. To address this issue, I created my own simplified wrapper function (called itshort_mice
), which should be sufficient for this situation. I then compared its performance to themice
wrapper through a short simulation using several methods (not justlasso.select.norm
) and observed quite different results.To Reproduce
Nsim = 10 and m = 5 should be increased!
Created on 2024-05-28 with reprex v2.1.0
Session info
``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.2 (2023-10-31 ucrt) #> os Windows 10 x64 (build 19045) #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate German_Switzerland.utf8 #> ctype German_Switzerland.utf8 #> tz Europe/Zurich #> date 2024-05-28 #> pandoc 3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> backports 1.4.1 2021-12-13 [1] CRAN (R 4.3.1) #> boot 1.3-28.1 2022-11-22 [2] CRAN (R 4.3.2) #> broom 1.0.5 2023-06-09 [1] CRAN (R 4.3.2) #> cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.2) #> codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.2) #> colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.2) #> curl 5.1.0 2023-10-02 [1] CRAN (R 4.3.2) #> digest 0.6.35 2024-03-11 [1] CRAN (R 4.3.3) #> dplyr * 1.1.3 2023-09-03 [1] CRAN (R 4.3.2) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.2) #> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.3) #> farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.2) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.2) #> forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.2) #> foreach 1.5.2 2022-02-02 [1] CRAN (R 4.3.2) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.2) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.2) #> ggplot2 * 3.5.0 2024-02-23 [1] CRAN (R 4.3.3) #> glmnet 4.1-8 2023-08-22 [1] CRAN (R 4.3.2) #> glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.3) #> gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.2) #> highr 0.10 2022-12-22 [1] CRAN (R 4.3.2) #> hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.2) #> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.2) #> iterators 1.0.14 2022-02-05 [1] CRAN (R 4.3.2) #> jomo 2.7-6 2023-04-15 [1] CRAN (R 4.3.2) #> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.2) #> labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.1) #> lattice 0.21-9 2023-10-01 [2] CRAN (R 4.3.2) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.3) #> lme4 1.1-35.1 2023-11-05 [1] CRAN (R 4.3.2) #> lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.3.2) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.2) #> MASS 7.3-60 2023-05-04 [2] CRAN (R 4.3.2) #> Matrix 1.6-1.1 2023-09-18 [2] CRAN (R 4.3.2) #> mice * 3.16.0 2023-06-05 [1] CRAN (R 4.3.2) #> minqa 1.2.6 2023-09-11 [1] CRAN (R 4.3.2) #> mitml 0.4-5 2023-03-08 [1] CRAN (R 4.3.2) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.2) #> mvtnorm * 1.2-3 2023-08-25 [1] CRAN (R 4.3.2) #> nlme 3.1-163 2023-08-09 [2] CRAN (R 4.3.2) #> nloptr 2.0.3 2022-05-26 [1] CRAN (R 4.3.2) #> nnet 7.3-19 2023-05-03 [2] CRAN (R 4.3.2) #> pan 1.9 2023-08-21 [1] CRAN (R 4.3.2) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.2) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.2) #> purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.2) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.3) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.3) #> R.oo 1.26.0 2024-01-24 [1] CRAN (R 4.3.3) #> R.utils 2.12.3 2023-11-18 [1] CRAN (R 4.3.3) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.2) #> ranger 0.16.0 2023-11-12 [1] CRAN (R 4.3.2) #> Rcpp 1.0.12 2024-01-09 [1] CRAN (R 4.3.3) #> readr * 2.1.4 2023-02-10 [1] CRAN (R 4.3.2) #> reprex 2.1.0 2024-01-11 [1] CRAN (R 4.3.3) #> rlang 1.1.2 2023-11-04 [1] CRAN (R 4.3.2) #> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.2) #> rpart 4.1.21 2023-10-09 [2] CRAN (R 4.3.2) #> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.2) #> scales 1.3.0 2023-11-28 [1] CRAN (R 4.3.3) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.2) #> shape 1.4.6.1 2024-02-23 [1] CRAN (R 4.3.2) #> stringi 1.8.3 2023-12-11 [1] CRAN (R 4.3.2) #> stringr * 1.5.0 2022-12-02 [1] CRAN (R 4.3.2) #> styler 1.10.3 2024-04-07 [1] CRAN (R 4.3.3) #> survival 3.5-7 2023-08-14 [2] CRAN (R 4.3.2) #> tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.2) #> tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.3.2) #> tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.3.3) #> tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.2) #> timechange 0.2.0 2023-01-11 [1] CRAN (R 4.3.2) #> tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.2) #> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.2) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.3) #> withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.3) #> xfun 0.41 2023-11-01 [1] CRAN (R 4.3.2) #> xml2 1.3.5 2023-07-06 [1] CRAN (R 4.3.2) #> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.2) #> #> [1] C:/Users/SepinJ/AppData/Local/R/win-library/4.3 #> [2] C:/Program Files/R/R-4.3.2/library #> #> ────────────────────────────────────────────────────────────────────────────── ```Expected behavior
I would expect the outputs to be more similar.
Thank you very much for any clarification!