Closed dmaltschul closed 6 years ago
Hi dmaltschul,
It seems that you have entered variables that indicate the missingness on other variables into the imputation model. As a result, these variables take on a value (say 0
) while the corresponding cell for another variable is always NA
. For example column 4
is always 0
when column 19
is NA
. In mice
such systems are removed by default as they are computationally unsolvable - there is zero covariance for columns 4
and 19
when column 4
takes on the value 0
.
I can think of two reasons why this dependency occurs:
In scenario 1, the indicator should be excluded (the same information is captured in the incomplete variable). In scenario 2, the bonafide missingness should not be imputed. We are currently working on new ways of taking bonafide missingness into account when such variables serve as predictors for other incomplete variabels.
So, your conclusion that the choice of methods makes no difference is correct. The highly dependent systems for variables 4
, 29
and 30
are avoided by removing these variables and keeping their dependent counterparts. This is not an error, but a means of still being able to computationally solve the system as a whole.
All the best,
Gerko
Hi Gerko,
Thanks for weighing in. So, I went ahead and eliminated those variables from the dataset entirely, but I'm still having the same problem, and there's nothing in loggedEvents now either. The central problem is that mice isn't filling in the NAs, it is returning imputed datasets that are all the same as the original.
Could it be my syntax? I wouldn't think so since I'm not changing much
mice.t = mice(df, method = c('','','pmm',... ), # rest are all 'pmm' except the last six diagnostics=TRUE, m = 10, maxit = 10)
(I've been keeping the arguments smaller than I would normally since I've been running this over and over.)
I am not able to replicate this. See the below example.
require(mice)
data <- read.csv(file = "mice-ex.csv", header = FALSE)
imp <- mice(data[, -c(4, 29, 30)], meth = "pmm", m = 2, maxit = 1)
imp$loggedEvents
apply(is.na(complete(imp)), 2, sum)
[1] FALSE
FALSE
here indicates that there are no missings anymore in the first imputed data set that is by default returned by mice::complete()
. Two reasons I can think of why your data still has missings:
method = c('', '', 'pmm', etcetera)
, you exclude variables from imputation, meaning that the first two and the last six variables are not imputed. complete(mice.t, "long", include = TRUE)
: the include = TRUE
statement includes the original [i.e. incomplete] data on top of the imputed data sets. So, the first cases are the original data, the second set of cases represent the first imputed datasets, and so on. All the best,
Gerko
Okay, that works for me too, which is great. I was able to isolate what seems to be causing the problem. When I tell method to include the information from certain columns but not impute them (e.g. the '' entries, see above), then mice is skipping through all of the columns, despite most of them having a 'pmm' or 'cart' or 'rf' assigned for their method.
So when I run your exact code but replace the meth argument 'pmm' with, say,
meth = c('','','pmm','pmm','pmm','pmm','pmm','pmm','pmm','pmm', 'pmm','pmm','pmm','pmm','pmm','pmm','pmm','pmm','pmm','pmm', 'pmm','pmm','pmm','pmm','pmm','pmm','pmm','pmm',
'pmm','','','','','','')`
Then I get the problem again. Is there something obvious here I am missing?
Your variables are still set to serve as predictors as specified by your predictor matrix. If you exclude the variables that have no imputation method from the predictor matrix, the problem disappears (see code below).
require(mice)
data <- read.csv(file = "mice-ex.csv", header = FALSE)
ini <- mice(data[, -c(4, 29, 30)], maxit = 0)
exclude <- c(1:2, 30:35)
meth <- ini$method
meth[exclude] <- ""
pred <- ini$predictorMatrix
pred[, exclude] <- 0
imp <- mice(data[, -c(4, 29, 30)], meth = meth, pred = pred, m=2, maxit = 1)
imp$loggedEvents
any(is.na(complete(imp)[, -exclude]))
[1] FALSE
Best,
Gerko
See also Issue #75
Hmm okay, yes, that does seem like the same issue.
So I suppose I can just impute all the variables and not use the ones I don't want in actual analyses, because I definitely do want the information in those vars to be used via the predictor matrix. That would get me to the same place as in earlier versions, I think. I was originally leaving out imputations for some predictor vars to save computational time, but I don't think it ended up making that much of a difference when I just left the imputations to run overnight.
Thanks for clearing that up.
Hello,
I recently tried to reproduce results from code I wrote several months ago, and I've run into some issues, primarily that mice isn't imputing any of the missing values I want.
I wanted to raise this as an issue because I had no trouble with these imputations using an earlier version (in the summer of 2017 - I am not sure of the version), and I used them in a prediction analysis to get very reasonable validation scores, so there wasn't anything wrong with the imputations.
The code runs, doesn't give any errors, but none of the imputed datasets now have any of the NAs filled in. This is somehow an issue involving the dataset, as I've tried running examples with mice using other datasets like nhanes, and they work fine.
The only clue I have is from loggedEvents, the results of which I pasted below.
(The numbers in dep and out are the column names, which I've anonymized as numbers - dataset itself is attached. Not all columns were imputed, the first 3 and last 6 in particular were left out.) I've read the documentation, and I don't entirely understand what this output means, but I thought it might be elucidating for the authors and others. What does seem to be the case is that there are a few problem columns, though this looks like it is a only a few, not all of them, and again, this wasn't an issue previously - all columns have successfully imputed.
I used the randomForest method originally, though changing to pmm or other methods makes no difference, the values remain NA. I'm no expert in multiple imputation, but I'm quite baffled.
Thanks for any help you can offer!
mice-ex.zip