amices / mice

Multivariate Imputation by Chained Equations
https://amices.org/mice/
GNU General Public License v2.0
444 stars 107 forks source link

NAs persist despite imputing auxilliary variables #422

Closed nikarjunagi closed 3 years ago

nikarjunagi commented 3 years ago

Hello, I am using MICE to impute 2-level longitudinal data. I am using 2lonly.pmm for imputation, but NAs persist. I am unable to explain this behavior despite following these 2 issues:

  1. https://github.com/amices/mice/issues/267
  2. https://github.com/amices/mice/issues/263

Data for the example is shared here. Using a publicly available tutorial for reprex:

micelong0 <- mice(EP16dat4, maxit = 0)
meth_micelong <- micelong0$method
pred_micelong <- micelong0$predictorMatrix

# don't impute hepato
meth_micelong[c("hepato")] <- ""

# exclude hepato from predictor of other models (because incomplete)
pred_micelong[, c("hepato")] <- 0

meth_micelong[c("copper", "ascites")] <- "2lonly.pmm"

pred_micelong[, "id"] <- -2
pred_micelong[, "day"] <- 2

# check the imputation method
meth_micelong
pred_micelong

micelong <- mice(EP16dat4, meth = meth_micelong, pred = pred_micelong,
                 maxit = 20, seed = 2019, printFlag = FALSE)

imputed_df <- complete(micelong)

get_na_count <- function(df, cols = NA){
  na_cols <- colSums(is.na(df))
  na_cols <- na_cols[na_cols >0]
  return(na_cols)
}

# Check NAs on non-imputed df
# Returns:
# ascites  hepato  copper 
# 346      61     469 
get_na_count(EP16dat4)

# Check NAs on imputed df
# Returns:
# ascites  hepato  copper 
# 22      61      16 
get_na_count(imputed_df)

Could you please help me understand why a few NA values persist?

There are warnings but no loggedEvents for the above sample code, but I do not see any warnings or loggedEvents for my dataset.

stefvanbuuren commented 3 years ago

I need to log into a google drive to get the data, which I don't want to do. Please provide a complete reprex using a small dataset.

stefvanbuuren commented 3 years ago

Sorry, cannot reproduce without a publicly available data set so I am closing.