amices / mice

Multivariate Imputation by Chained Equations
https://amices.org/mice/
GNU General Public License v2.0
437 stars 107 forks source link

Failure to impute a missing value in a balanced table #150

Closed ellisp closed 5 years ago

ellisp commented 5 years ago

This behaviour seems to be a bug:

library(mice)

data <- data.frame(
  x = c("a", "b", "a", "b"),
  y = c("A", "A", "B", "B"),
  z = c(8, NA, 8, 9)
)

data_imp <- mice(data)
mice::complete(data_imp, 1)

which returns

  x y  z
1 a A  8
2 b A NA
3 a B  8
4 b B  9

That is, there's no imputed value in any of the "complete" datasets for the second value of z.

Things that don't help including explicitly telling mice where the missing value is, or trying different imputation methods.

Things that do help include changing the third value of z to anything other than an 8 or 9.

ellisp commented 5 years ago

OK, I worked it out from reading your material here, sorry I logged the issue.

z isn't imputed because it is apparently collinear with x. This behaviour can be overridden with

data_imp <- mice(data, where = is.na(data), print = FALSE, remove.collinear = FALSE)
mice::complete(data_imp, 1)
stefvanbuuren commented 5 years ago

Thanks. Actually

data_imp <- mice(data, print = FALSE, remove.collinear = FALSE)

is enough to force imputation of collinear variables. Related to #48.

I also write

Note that setting remove.constant = FALSE or remove.collinear = FALSE bypasses usual safety measures in mice, and could cause problems further down the road.

We see that mice will still throw some warnings because of the collinearity. The recommended approach is to specify the predictorMatrix argument. In your case, we could specify

pred <- make.predictorMatrix(data)
pred["z", "x"] <- 0
data_imp <- mice(data, print = FALSE, pred = pred, remove.collinear = FALSE)
complete(data_imp, 1)