I would expect the imputed values of x to be the same if the same preditor variables were used, despite other variables being imputed or not, but it's not the case, as reproduced here:
library(data.table)
library(robustlmm)
library(mice)
library(miceadds)
library(magrittr)
library(dplyr)
library(tidyr)
set.seed(1)
# Data ------------------------------------
dt1 <- data.table(id = rep(1:10, each=3),
group = rep(1:2, each=15),
time = rep(1:3, 10),
sex = rep(sample(c("F","M"),10,replace=T), each=3),
x = rnorm(30),
y = rnorm(30),
z = rnorm(30))
setDT(dt1)[id %in% sample(1:10,4) & time == 2, `:=` (x = NA, y = NA)][
id %in% sample(1:10,4) & time == 3, `:=` (x = NA, y = NA)]
dt2 <- dt1 %>% group_by(id) %>% fill(y) %>% ungroup %>% as.data.table
# MI 1 ------------------------------------
pm1 <- make.predictorMatrix(dt1)
pm1['x',c('y','z')] <- 0
pm1[c('x','y'), 'id'] <- -2
imp1 <- mice(dt1, pred = pm1, meth = "2l.pmm", seed = 1, m = 2, print = F, maxit = 20)
# boundary (singular) fit: see ?isSingular - don't know how to interpret this (don't occur with my real data)
View(complete(imp1, 'long'))
# MI 2 ------------------------------------
pm2 <- make.predictorMatrix(dt2)
pm2['x',c('y','z')] <- 0
pm2['x', 'id'] <- -2
imp2 <- mice(dt2, pred = pm2, meth = "2l.pmm", seed = 1, m = 2, print = F, maxit = 20, remove.constant = F)
# imp2$loggedEvents report sex as constant (don't know why) so I include remove.constant=F to keep that variable (don't occur with my real data)
View(complete(imp2, 'long'))
In imp1:
group, time and sex are used to predict x
group, time, sex, x and z are used to predict y
In ìmp2:
group, time and sex are used to predict x
y is complete so no imputation is performed for this variable
Given so, why are the results different for the imputed data on x?
Is it the expected behavior?
Thank you!
PS: I've posted this same question in StackOverflow (before I remember posting it here). Should I delete that post to avoid crossed posts or simply add there the link to here?
I would expect the imputed values of
x
to be the same if the same preditor variables were used, despite other variables being imputed or not, but it's not the case, as reproduced here:In
imp1
:group
,time
andsex
are used to predictx
group
,time
,sex
,x
andz
are used to predicty
In
ìmp2
:group
,time
andsex
are used to predictx
y
is complete so no imputation is performed for this variableGiven so, why are the results different for the imputed data on
x
? Is it the expected behavior?Thank you!
PS: I've posted this same question in StackOverflow (before I remember posting it here). Should I delete that post to avoid crossed posts or simply add there the link to here?