amices / mice

Multivariate Imputation by Chained Equations
https://amices.org/mice/
GNU General Public License v2.0
447 stars 108 forks source link

remove.lindep() causes mice.impute.lasso.norm() call to glmnet() to fail in some circumstances #577

Closed 137alpha closed 1 year ago

137alpha commented 1 year ago

Describe the bug

For some data sets, remove.lindep() appears to be too aggressive in pruning predictor variables.

The method mice.impute.lasso.norm() calls glmnet(). glmnet() will fail if only given one input predictor with the error message:

x should be a matrix with 2 or more columns

On some data sets, using method=lasso.norm causes fitting to fail with the above error message.

Workaround

In accordance with suggestions in other issues, calling mice with eps=0 disables remove.lindep() and causes fitting to succeed.

Request

In the absence of a fundamental fix, could you please flag this in the manual as a potential failure mode?

stefvanbuuren commented 1 year ago

Thanks for noting. Difficult to replicate without the data. Here's my attempt at reproducing the error message.

library(mice, warn.conflicts = FALSE)
set.seed(123)

n <- 100
y <- rnorm(n)
x <- rep(1, n)
y[sample(1:n, n * .3)] <- NA
ry <- !is.na(y)

# Test univariate imputation outside mice
imps <- mice.impute.lasso.norm(y, ry, as.matrix(x))
#> Error: from glmnet C++ code (error code 7777); All used predictors have zero variance
imps <- mice.impute.lasso.norm(y, ry, as.matrix(x)[, -1])
#> Error in glmnet(x, y, weights = weights, offset = offset, lambda = lambda, : x should be a matrix with 2 or more columns

# Test inside mice
input <- data.frame(y = y, x = 1)
imp <- mice(input, m = 1, maxit = 1, method = "lasso.norm", print = FALSE)
#> Error in glmnet(x, y, weights = weights, offset = offset, lambda = lambda, : x should be a matrix with 2 or more columns
imp <- mice(input, m = 1, maxit = 1, method = "lasso.norm", print = FALSE, eps = 0)
#> Error in glmnet(x, y, weights = weights, offset = offset, lambda = lambda, : x should be a matrix with 2 or more columns
imp <- mice(input, m = 1, maxit = 1, method = "lasso.norm", print = FALSE, eps = 0, remove.constant = FALSE)
#> Error: from glmnet C++ code (error code 7777); All used predictors have zero variance

Created on 2023-08-14 with reprex v2.0.2

What we can conclude from this:

I was unable to create a case where eps = 0 allowed mice.impute.lasso.norm() to succeed, so I am not sure whether remove.lindep() is the culprit here. I guess that remove.lindep() cannot remove all variables because of collinearity, and that at least one variable should stay. But I might be wrong.

If you can document a case where remove.lindep() silently produces a zero-column predictor dataset, please let me know. Would you be able to create a reprex of the behaviour you see?

stefvanbuuren commented 1 year ago

Likely to be data-dependent, but not able to replicate reliably. Without a reprex of the behaviour, I cannot address the issue.

Temporary fix: If you experience this problem, try setting eps = 0