Closed stefvanbuuren closed 7 years ago
Hi Tobias, thanks for your mail.
The as.mids()
function calls the mice()
function to get an initial mids
object, which is then later post-processed by as.mids()
. Your problem is caused by the default behaviour of mice
, which removes any collinear variables at start-up, and so your ini$imp$newvar
variables get a NULL
value, and any imputations it has get lost.
As this behaviour is confusing in the context of as.mids()
, I will make some changes to the as.mids()
and check.data()
functions that will allow us to bypass removal of collinear variables at startup.
You should be able to run your code in mice 2.37
.
Hi Stef,
To echo Tobias’s original post, thanks very much for creating MICE, I am still developing my R skills (so please excuse any stupid question) and am using your programme to try and carry out MI on a large repeated measures data set looking at outcomes for Chronic Pain patients at a UK hospital.
All of the (108) variables have missing data at various points, and so it presents challenges that Tobias may not have faced…
I am able to run the MI process successfully on the raw data using:
View(slimmed_down_working_data_file_for_R)
library(mice)
library(VIM)
library(lattice)
library(ggplot2)
options(max.print = 100000)
md.pattern(slimmed_down_working_data_file_for_R)
imputation <-
mice(slimmed_down_working_data_file_for_R,m=1,maxit=50,meth='pmm',seed=500)
But then run in to difficulty…
I need to create a number of subscales from the MICE output that can then be used for further pooled analysis.
I understand that I am not able to do this directly from the pooled data set (the MICE output) and instead need to create an R object(s) that can be manipulated by the necessary functions (e.g. in Tobais’s case he converted to a Long format).
However, as I have applied MICE to a large data set I have ‘viewed’ , rather than ‘#Generating data’ in the console as Tobias has done; my individual raw variables are not present as individual in R as objects which can be included in then computing a subscale score.
In short:
Is it possible to calculate the mean score for a group of variables from the MICE output, and then have this score created as a new variable? If so what is the best way to approach.
If I manage this successfully, do I then need to convert back to a MIDS object (using the ‘imput.short <- as.mids’) before carrying out any further analysis?
Many thanks in advance, Alistair
Hi Alistair,
The issue you raise is straightforward to solve with passive imputation in mice
. Please see the following example code based on the nhanes
example data set from package mice
:
set.seed(123)
new <- NA
nhanes3 <- cbind(nhanes, new)
ini <- mice(nhanes3, maxit = 0)
meth <- ini$meth
#set new to passive imputation
meth["new"] <- "~ I(bmi + chl)"
imp <- mice(nhanes3, meth=meth)
This example calculates new
as the sum of bmi
and chl
. If you calculate new
at the end of each iteration (if you visit it last in each iteration) with passive imputation, new
will always be the sum of the observed and/or imputed information. This solutions yields exactly what you are looking for.
> head(complete(imp))
age bmi hyp chl new
1 1 26.3 1 118 144.3
2 2 22.7 1 187 209.7
3 1 30.1 1 187 217.1
4 3 25.5 2 204 229.5
5 1 20.4 1 113 133.4
6 3 22.7 1 184 206.7
If you'd like to know more about the specifics and caveats of passive imputation, please have a look at the corresponding vignette in the miceVignettes repository
All the best,
Gerko
Hi Gerko,
Thanks very much for your help, much appreciated! I have tried to implement your code and have had some success, however I have encountered 25 'warnings' :
In Ops.factor(Rsf1, Rsf2) : ‘+’ not meaningful for factors
do you know why this is?
many thanks, Alistair
Yes, some of the variables you are trying to add together are factors, i.e. categorical variables where categories (labels) are represented by values. R is simply warning you that adding categorical variables may not be what you desire to prevent you from making an accidental error.
All the best,
Gerko
Hello Stef and Gerko, I have some questions regarding passive imputation. I would like to create a sum score (variable = New) after the imputation. I used the codes Gerko suggested above and modified it.
I added a ID variable to nhanes and create a sum score based on chl and hyp.
nhanes$ID <- seq.int(nrow(nhanes))
New <- NA
Data_new<- cbind(nhanes, New)
ini <- mice(Data_new, max = 0, method = c('','pmm','pmm','pmm','',''))
meth <- ini$meth
meth["New"] <- "~I(chl+hyp)"
Then, I modified the predictor matrix because ID and New variable (sum score of chl+hyp) should not be predicted by other variables and should not be the predictor of other variables.
pred <- ini$predictorMatrix
pred[, "ID"] <- 0
pred["ID",] <- 0
pred["New",] <- 0
pred
age bmi hyp chl ID New
age 0 1 1 1 0 0
bmi 1 0 1 1 0 0
hyp 1 1 0 1 0 0
chl 1 1 1 0 0 0
ID 0 0 0 0 0 0
New 0 0 0 0 0 0
Then, I run mice.
Test <- mice(Data_new,
meth = meth,
pred = pred,
m=5,
maxit=5,
diagnostics=TRUE,
seed = 123456)
head(complete(Test))
I got strange result for the "New" variable (New=chl+hyp).
age bmi hyp chl ID New
1 1 29.6 1 187 1 -0.18195807
2 2 22.7 1 187 2 1.22596857
3 1 27.2 1 187 3 -1.55123662
4 3 27.5 1 186 4 0.41716072
5 1 20.4 1 113 5 0.85837692
6 3 20.4 2 184 6 -0.07179878
After I removed the this line of code: pred["New",] <- 0, the result seems to be reasonable.
pred <- ini$predictorMatrix
pred[, "ID"] <- 0
pred["ID",] <- 0
#pred["New",] <- 0
PredictorMatrix:
age bmi hyp chl ID New
age 0 1 1 1 0 0
bmi 1 0 1 1 0 0
hyp 1 1 0 1 0 0
chl 1 1 1 0 0 0
ID 0 0 0 0 0 0
New 1 1 1 1 0 0
> head(complete(Test2))
age bmi hyp chl ID New
1 1 29.6 1 187 1 188
2 2 22.7 1 187 2 188
3 1 27.2 1 187 3 188
4 3 27.5 1 186 4 187
5 1 20.4 1 113 5 114
6 3 20.4 2 184 6 186
Here are my questions:
Age contains no missing data, I thought mice would set all values for age in the row of predictor matrix to 0, but it did not. I am not sure if that just happened in my computer or not.
After I removed this line of code pred["New",] <- 0, the imputation seems to work well. However, the predictor matrix for variable "New" did not reflect its actual imputation model, would that be a problem?
Hi Alistair,
The issue you raise is straightforward to solve with passive imputation in
mice
. Please see the following example code based on thenhanes
example data set from packagemice
:set.seed(123) new <- NA nhanes3 <- cbind(nhanes, new) ini <- mice(nhanes3, maxit = 0) meth <- ini$meth #set new to passive imputation meth["new"] <- "~ I(bmi + chl)" imp <- mice(nhanes3, meth=meth)
This example calculates
new
as the sum ofbmi
andchl
. If you calculatenew
at the end of each iteration (if you visit it last in each iteration) with passive imputation,new
will always be the sum of the observed and/or imputed information. This solutions yields exactly what you are looking for.> head(complete(imp)) age bmi hyp chl new 1 1 26.3 1 118 144.3 2 2 22.7 1 187 209.7 3 1 30.1 1 187 217.1 4 3 25.5 2 204 229.5 5 1 20.4 1 113 133.4 6 3 22.7 1 184 206.7
If you'd like to know more about the specifics and caveats of passive imputation, please have a look at the corresponding vignette in the miceVignettes repository
All the best,
Gerko
Hi Gerko, thanks for the useful explanation of passive imputation with mice. Could I add a question to this: would passive imputation also be applicable to change scores (i.e. outcome - baseline)? I can imagine a problem with this as we are assuming a correlation between the dependent and independent variable. However, perhaps I'm interpreting this issue incorrectly in the context of imputation. Any opinion from yourself or Stef on this would be very welcome. Thanks Sebastian
Can’t find it on github…
Van: sophar notifications@github.com Beantwoorden - Aan: stefvanbuuren/mice reply@reply.github.com state_change@noreply.github.com Onderwerp: Re: [stefvanbuuren/mice] Create new variable after imputation (#34)
When I'm trying to generate the long dataset to create new variables after imputation, I get the following error message:
# Convert to Long
long <- mice::complete(df2, "long",include = TRUE)
Fehler: Column "pCare_doc" can't be converted from logical to numeric
I'm sorry, I did not manage to create a reproducible example for this, it just happens with my (large) dataset. But maybe you still have an idea what this could be? So pCare_doc is a logical variable, but why should it be converted?
Unable to replicate. mice()
converts logicals into 0/1 variables, but the following runs fine.
library(mice)
data <- data.frame(nhanes2,
flags = rep(c(TRUE, FALSE, FALSE, NA, TRUE), 5))
imp <- mice(data, m = 1, print = FALSE)
long <- mice::complete(imp, "long", include = TRUE)
str(long)
imp2 <- as.mids(long)
imp2
# force logical
long2 <- long
long2$flags <- as.logical(long2$flags)
str(long2)
imp3 <- as.mids(long2)
Hello Stef, thanks a lot for your help, really appreciated. I deleted my question when I realized that the error is related to the automatic conversion of logicals (which I did not know before). So I've made a workaround to convert all logicals to factors before mice
and complete
and converting them back from factor to logical afterwards.
library(mice)
data <- data.frame(nhanes2,
flags = rep(c(TRUE, FALSE, FALSE, NA, TRUE), 5))
data$flags <- factor(data$flags)
imp <- mice(data, m = 1, print = FALSE)
long <- mice::complete(imp, "long", include = TRUE)
long$flags <- as.logical(long$flags)
imp2 <- as.mids(long)
imp2
However, I'm not sure I understood your solution, as the error occurs when using complete
.
Hello,
I am trying to do passive imputation as in @gerkovink 's example but in my case I need to use ifelse() function. I.e., instead of calculating the sum of two previous variables for the new varible, I need the new variable to be "0" if another previous variable is "0" and "1" otherwise. Is it possible to do this? If so, how?
Thanks a lot!
Yes sure. The I()
is just a function, so you can replace it by something else, e.g.
library(mice)
meth <- make.method(nhanes)
meth["bmi"] <- "~ ifelse(age == 1, 0, 1)"
imp <- mice(nhanes, method = meth, m = 1, maxit = 1, seed = 1)
head(complete(imp))
Hello, again and thank you!
I tried this but it keeps the new variable as NA.
library(mice) x1 <- c(0, 2, 1, NA, 0, 3, 1, 0, 0) x2 <- c(1, 3, 2, 3, 4, 4, 1, 2, NA) data<-data.frame(x1, x2) data$new<-NA meth <- make.method(data) meth["new"] <- " ~ ifelse(x1 == 0, 0, 1)" imp <- mice(data, method = meth, m = 1, maxit = 1, seed = 1) head(complete(imp))
This is my output:
head(complete(imp)) x1 x2 new 1 0 1 NA 2 2 3 NA 3 1 2 NA 4 3 3 NA 5 0 4 NA 6 3 4 NA
Maybe I am tipying something wrong? I am sorry! THANKS ONCE MORE!
This is a mail I got from Tobias Rolfes:
Datum: 20 mei 2017 15:48:53 GMT+5:30 Onderwerp: Mice: Create new variable after imputation
Hello Stef,
Thank you very much for creating such an useful package for multiple imputation.
Currently, I am facing the problem that I want to create a new variable after calculating imputations (e.g., sum scores of items) and calculate regressions with the new variabel. However, when I am doing so (cf., programm code below), the originally missing cases are deleted in the regression due to missings. Do you have an idea how I can solve the problem?
Many thanks in advance for your answer.
Best, Tobias