amices / mice

Multivariate Imputation by Chained Equations
https://amices.org/mice/
GNU General Public License v2.0
433 stars 107 forks source link

monotone missingness: not all values imputed in first step of random draws #489

Closed sff06 closed 1 year ago

sff06 commented 2 years ago

I am following suggestions of the book of van Buuren (2018), p. 114f, to first impute (a few) single missing values that destroy monotone missingness pattern by a simple random draw before imputing values that follow monotone missingness. Applying the first step (single draws) with the code from the book adapted to my data and variables does not impute each of those singles values.

In case I have done something wrong, here is my code:

dataframe "datitems" contains all items that have single missing values that destroy monotone missingness pattern:

{ where <- make.where(datitems_impmonot, "none") where[c(63,222), "BB_19R"] <- TRUE where[c(63,222), "BB_20"] <- TRUE where[c(63,222), "BB_21"] <- TRUE where[c(63,222), "BB_22R"] <- TRUE where[c(63,222), "BB_23"] <- TRUE where[c(63,222), "BB_24"] <- TRUE where[c(63,222), "BB_25R"] <- TRUE where[c(63,222), "BB_26"] <- TRUE where[c(63,222), "BB_27R"] <- TRUE where[c(63,222), "BB_28"] <- TRUE where[c(63,222), "BB_29R"] <- TRUE where[c(63,222), "BB_30"] <- TRUE where[c(63,222), "BB_31R"] <- TRUE where[c(63,222), "BB_32"] <- TRUE where[c(63,167,222), "BB_33R"] <- TRUE where[c(63,222), "BB_34"] <- TRUE where[c(63,222), "BB_35R"] <- TRUE where[c(63,222), "BB_36"] <- TRUE where[130, "PTMR16"] <- TRUE where[40, "a1_34"] <- TRUE where[,"BB_33R"] } datitems_imp <- mice(datitems, where=where, m = 1, method = "sample", seed = 21980, maxit = 1, print = FALSE) datitems2 <- mice::complete(datitems_imp)

Checking, for instance,

datitems2$BB_19R[63]

still gives NA

Do you have any suggestions?

A workaround could be using: where <- make.where(datitems_impmonot, "missing") and transfering only sampled values to the original data set that initially destroyed monotone missingness pattern. Are there any objections to this workaround?

Thank you!

Best Stanley

hanneoberman commented 1 year ago

Hi @sff06, could you please share a reprex for this issue?

sff06 commented 1 year ago

Thank you for your help! Here is a reprex:

reprex::reprex(session_info=TRUE,{ library(mice) mydata = data.frame( BB_33R = c(2,2,2,1,5,4,3,1,1, 4,3,1,4,1,2,2,5,4,1,1,2,1,3,3,1,5,3,1, 1,4,2,NA,4,1,NA,5,1,2,4,1,3,1,NA,NA,3, 1,1,NA,1,1,5,1,1,1,NA,5,2,7,1,3,2,NA, NA,NA,2,1,2,NA,1,3), BB_34 = c(6,1,6,7,7,4,4,5,6, 4,4,7,7,7,5,2,7,7,7,3,6,5,4,5,4,4,5,4, 5,7,5,NA,1,6,NA,2,5,3,5,5,5,5,NA,NA,5, 4,6,NA,4,4,4,4,5,7,NA,4,5,4,4,5,7,NA, NA,NA,5,2,1,NA,6,6), BB_35R = c(3,5,2,1,5,2,3,4,4, 4,3,1,3,1,2,2,5,3,1,1,3,1,4,2,1,7,3,1, 1,3,2,NA,3,1,NA,6,1,3,3,3,3,3,NA,NA,3, 2,1,NA,4,1,3,1,3,2,NA,7,5,3,4,3,4,NA, NA,NA,3,2,4,NA,2,4), BB_36 = c(2,1,1,5,5,3,4,5,2, 5,2,2,5,1,1,1,3,3,4,1,2,1,2,2,1,1,4,3, 3,5,3,NA,1,1,NA,2,1,4,7,2,4,1,NA,NA,2, 2,3,NA,4,1,2,3,3,3,NA,1,5,2,1,2,5,NA, NA,NA,4,1,1,NA,2,4), PTMR16 = c(1,1,2,5,1,2,1,2,2, 3,2,2,2,1,1,1,1,1,4,1,5,1,1,2,2,1,4,2, 1,NA,3,NA,1,1,NA,1,1,2,3,1,2,4,NA,NA, NA,3,1,NA,1,1,1,1,1,1,NA,3,2,1,1,1,1, NA,NA,NA,1,1,1,NA,3,1), a1_34 = c(4,3,5,4,3,4,3,4,4, 3,4,5,2,4,4,4,4,3,4,4,4,4,4,4,3,3,4,3, 4,NA,4,NA,4,4,NA,3,3,4,1,NA,4,4,NA,NA, NA,4,5,NA,3,3,4,NA,4,4,NA,3,3,1,4,3,4, NA,3,NA,NA,3,2,NA,5,NA) ) { where <- make.where(mydata, "none") where[63, "BB_34"] <- TRUE where[63, "BB_35R"] <- TRUE where[63, "BB_36"] <- TRUE where[40, "a1_34"] <- TRUE imp_monot <- mice(mydata, where = where, m = 1, method = "sample", seed = 21980, maxit = 1, print = FALSE) } datitems2 <- mice::complete(imp_monot) datitems2[63,] datitems2[40,]

datitems2[40,5] has a value, but datitems2[63,1:3] is still NA

compare with initial data:

mydata[63,] mydata[40,] })

thomvolker commented 1 year ago

This reprex does not show the output of your function calls. You can create a reprex by copying the code you want to run, call reprex::reprex() in R, which pastes the output of your code to your clipboard. Subsequently, you can simply paste the code here.

However, I think your problem is due to wanting to impute the first values that destroy the monotone pattern, using the predictors that suffer from missingness as well. This is not possible, as the NAs in the other predictors do not allow to calculate plausible values for the predictors that you want to impute first. So, if you want to use this approach, you have to impute the first values that destroy your monotone missingness pattern using solely those predictors that do not have missings. You can do this by specifying a predictor matrix, and setting the predictors (the columns) to impute the outcome variable (in the row) to zero, for those variables that have missings.

sff06 commented 1 year ago

library(mice)

>

> Attaching package: 'mice'

> The following object is masked from 'package:stats':

>

> filter

> The following objects are masked from 'package:base':

>

> cbind, rbind

mydata = data.frame( BB_33R = c(2,2,2,1,5,4,3,1,1,4,3,1,4,1,2,2,5,4,1,1,2,1,3,3,1,5,3,1,1,4,2,NA,4,1,NA,5,1,2,4,1,3,1,NA,NA,3,1,1,NA,1,1,5,1,1,1,NA,5,2,7,1,3,2,NA,NA,NA,2,1,2,NA,1,3), BB_34 = c(6,1,6,7,7,4,4,5,6,4,4,7,7,7,5,2,7,7,7,3,6,5,4,5,4,4,5,4,5,7,5,NA,1,6,NA,2,5,3,5,5,5,5,NA,NA,5,4,6,NA,4,4,4,4,5,7,NA,4,5,4,4,5,7,NA,NA,NA,5,2,1,NA,6,6), BB_35R = c(3,5,2,1,5,2,3,4,4,4,3,1,3,1,2,2,5,3,1,1,3,1,4,2,1,7,3,1,1,3,2,NA,3,1,NA,6,1,3,3,3,3,3,NA,NA,3,2,1,NA,4,1,3,1,3,2,NA,7,5,3,4,3,4,NA,NA,NA,3,2,4,NA,2,4), BB_36 = c(2,1,1,5,5,3,4,5,2,5,2,2,5,1,1,1,3,3,4,1,2,1,2,2,1,1,4,3,3,5,3,NA,1,1,NA,2,1,4,7,2,4,1,NA,NA,2,2,3,NA,4,1,2,3,3,3,NA,1,5,2,1,2,5,NA,NA,NA,4,1,1,NA,2,4), PTMR16 = c(1,1,2,5,1,2,1,2,2,3,2,2,2,1,1,1,1,1,4,1,5,1,1,2,2,1,4,2,1,NA,3,NA,1,1,NA,1,1,2,3,1,2,4,NA,NA,NA,3,1,NA,1,1,1,1,1,1,NA,3,2,1,1,1,1,NA,NA,NA,1,1,1,NA,3,1), a1_34 = c(4,3,5,4,3,4,3,4,4,3,4,5,2,4,4,4,4,3,4,4,4,4,4,4,3,3,4,3,4,NA,4,NA,4,4,NA,3,3,4,1,NA,4,4,NA,NA,NA,4,5,NA,3,3,4,NA,4,4,NA,3,3,1,4,3,4,NA,3,NA,NA,3,2,NA,5,NA) ) { where <- make.where(mydata, "none") where[63, "BB_34"] <- TRUE where[63, "BB_35R"] <- TRUE where[63, "BB_36"] <- TRUE where[40, "a1_34"] <- TRUE imp_monot <- mice(mydata, where = where, m = 1, method = "sample", seed = 21980, maxit = 1, print = FALSE) } datitems2 <- mice::complete(imp_monot) datitems2[63,]

> BB_33R BB_34 BB_35R BB_36 PTMR16 a1_34

> 63 NA NA NA NA NA 3

datitems2[40,]

> BB_33R BB_34 BB_35R BB_36 PTMR16 a1_34

> 40 1 5 3 2 1 5

datitems2[40,5] has a value, but datitems2[63,1:4] is NA

compare with initial data:

mydata[63,]

> BB_33R BB_34 BB_35R BB_36 PTMR16 a1_34

> 63 NA NA NA NA NA 3

mydata[40,]

> BB_33R BB_34 BB_35R BB_36 PTMR16 a1_34

> 40 1 5 3 2 1 NA

gerkovink commented 1 year ago

Closing as this is expected behaviour as indicated by @thomvolker