amices / mice

Multivariate Imputation by Chained Equations
https://amices.org/mice/
GNU General Public License v2.0
427 stars 107 forks source link

Error when using `2l.pan` and the `where` argument: missing values in pred not allowed #520

Closed isaactpetersen closed 1 year ago

isaactpetersen commented 1 year ago

I'm trying to perform multilevel imputation using 2l.pan and to use the where argument to specify which cells to impute. I receive the following error:

Error in pan::pan(y1, subj, pred, xcol, zcol, prior, seed = s1, iter = paniter): missing values in pred not allowed

However, the error is odd because there are no missing values in the pred object. I did some troubleshooting, and I do not receive the error when I remove the wave variable (which is not used in the prediction model) from the dataframe. In addition, the error does not occur when I don't use the where argument.

Below is a minimal reproducible example. Attached is an example datafile.

library("mice")
#> 
#> Attaching package: 'mice'
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> The following objects are masked from 'package:base':
#> 
#>     cbind, rbind

load(file.path("C:/Users/itpetersen/Downloads/df.rdata"))

# Variables to Impute

Y <- c("v1","v2")

# Multilevel Imputation Methods

meth <- make.method(df)
meth[1:length(meth)] <- ""
meth[Y] <- "2l.pan"

# Predictor Matrix

pred <- make.predictorMatrix(df)
pred[1:nrow(pred), 1:ncol(pred)] <- 0
pred[Y, "id"] <- (-2) #cluster variable
pred[Y, "age"] <- 2 #random effect predictor
pred[Y, Y] <- 1 #fixed effect predictor

diag(pred) <- 0 #don't let variable predict itself

pred
#>      id wave age v1 v2
#> id    0    0   0  0  0
#> wave  0    0   0  0  0
#> age   0    0   0  0  0
#> v1   -2    0   2  0  1
#> v2   -2    0   2  1  0
table(is.na(pred))
#> 
#> FALSE 
#>    25

# Specify cells to impute

whereMatrix <- is.na(df)
whereMatrix[which(df$wave >= 6),] <- FALSE

# Perform Multiple Imputation

midata <- mice(
  data.frame(df),
  method = meth,
  predictorMatrix = pred,
  m = 1,
  maxit = 1,
  seed = 52242,
  where = whereMatrix)
#> 
#>  iter imp variable
#>   1   1  v1
#> Error in pan::pan(y1, subj, pred, xcol, zcol, prior, seed = s1, iter = paniter): missing values in pred not allowed

# Session Info

sessionInfo()
#> R version 4.2.0 (2022-04-22 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.utf8 
#> [2] LC_CTYPE=English_United States.utf8   
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.utf8    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] mice_3.14.9
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.9        compiler_4.2.0    pillar_1.8.1      highr_0.9        
#>  [5] R.methodsS3_1.8.2 R.utils_2.12.1    tools_4.2.0       digest_0.6.30    
#>  [9] lattice_0.20-45   evaluate_0.18     lifecycle_1.0.3   tibble_3.1.8     
#> [13] R.cache_0.16.0    pkgconfig_2.0.3   rlang_1.0.6       reprex_2.0.2     
#> [17] DBI_1.1.3         cli_3.4.1         rstudioapi_0.14   yaml_2.3.6       
#> [21] xfun_0.34         fastmap_1.1.0     withr_2.5.0       styler_1.8.1     
#> [25] stringr_1.4.1     dplyr_1.0.10      knitr_1.40        generics_0.1.3   
#> [29] fs_1.5.2          vctrs_0.5.0       grid_4.2.0        tidyselect_1.2.0 
#> [33] glue_1.6.2        R6_2.5.1          fansi_1.0.3       rmarkdown_2.18   
#> [37] tidyr_1.2.1       purrr_0.3.5       magrittr_2.0.3    backports_1.4.1  
#> [41] htmltools_0.5.3   assertthat_0.2.1  utf8_1.2.2        stringi_1.7.8    
#> [45] broom_1.0.1       pan_1.6           R.oo_1.25.0

Created on 2022-11-10 with reprex v2.0.2

gerkovink commented 1 year ago

The pred error you receive is not related to mice, but relates to pan::pan(). Its third argument (pred) expects a fully observed predictor space. With the specified where matrix you have deliberately chosen to leave this predictor space incomplete. See addition to the example below:

library("mice")

temp <- tempfile()
download.file("https://github.com/amices/mice/files/9982439/df.zip", temp)
load(unz(temp, "df.rdata"))
unlink(temp)

# Variables to Impute

Y <- c("v1","v2")

# Multilevel Imputation Methods

meth <- make.method(df)
meth[1:length(meth)] <- ""
meth[Y] <- "2l.pan"

# Predictor Matrix

banana <- make.predictorMatrix(df)
banana[1:nrow(banana), 1:ncol(banana)] <- 0
banana[Y, "id"] <- (-2) #cluster variable
banana[Y, "age"] <- 2 #random effect predictor
banana[Y, Y] <- 1 #fixed effect predictor
diag(banana) <- 0 #don't let variable predict itself
table(is.na(banana))
#> 
#> FALSE 
#>    25

# Specify cells to impute
whereMatrix <- is.na(df)
whereMatrix[which(df$wave >= 6), ] <- FALSE

# Perform Multiple Imputation
midata <- mice(
  data.frame(df),
  method = meth,
  predictorMatrix = banana,
  m = 1,
  maxit = 1,
  seed = 52242,
  where = whereMatrix)
#> 
#>  iter imp variable
#>   1   1  v1
#> Error in pan::pan(y1, subj, pred, xcol, zcol, prior, seed = s1, iter = paniter): missing values in pred not allowed

# Perform Multiple Imputation without where
midata <- mice(
  data.frame(df),
  method = meth,
  predictorMatrix = banana,
  m = 1,
  maxit = 1,
  seed = 52242)
#> 
#>  iter imp variable
#>   1   1  v1  v2

Created on 2022-11-10 with reprex v2.0.2

gerkovink commented 1 year ago

FWIW: mice uses mitml::panImpute() internally. Perhaps @simongrund1 can shine some light if using a where strategy to omit some cases to be imputed would be desired behaviour with mitml::panImpute().

isaactpetersen commented 1 year ago

Thanks for the prompt response. Does that mean it is not possible to use the where argument with 2l.pan? I would like to use the where argument to specify a subset of rows to impute. I do not want to filter out rows before fitting the model because I want all data to inform the imputation for the subset of rows. Does that make sense?

gerkovink commented 1 year ago

Yes, but not if the method expects the column to be completely observed.

If you'd like the incomplete data to inform the imputations, then why would you not draw values for those informers? They're coming from the same posterior predictive distribution, after all. Information that is not there can also not inform if left unobserved.

Internally in 'mice' there is the possibility for a difference between the set used for parameter estimation and the set to be imputed. The set used for parameter estimation is then, naturally, made complete behind the scenes. The caveat is now that the method for generating the imputations gets parsed the incomplete predictor space. And it does not accept that. So, unfortunately you cannot use the 'where' argument in this case.

That said, if your goal is to exclude the flow of information in your analysis, you may choose to still impute all and analyze some. The models will still be congenial, then.

simongrund1 commented 1 year ago

I don't think it matters whether "2l.pan" uses pan::pan oder mitml::panImpute internally, because panImpute just arranges the outcome and predictor matrices and then calls pan::pan. PAN requires complete predictors. Any workaround would require splitting the estimation vs. imputation part of the method like it is for "norm" an other methods that support where. This would need to be done in mice though.

In the case described above, where v1 and v2 are missing simultaneously, and all other variables are complete, I believe the where strategy is equivalent with just running mice on the wave <= 5 subset. More generally, if the aim is to leave values in a variable unimputed while still using that variable to inform imputations for the same units on other variables, then one could impute the variable anyway and delete the values afterwards. But then again, I'm no expert on the where feature, so maybe there's a better way.

gerkovink commented 1 year ago

Thanks, @simongrund1.

@isaactpetersen thanks for raising this issue. I have proposed a note to the where argument documentation. See #521. Closing now as it is not related to mice.

isaactpetersen commented 1 year ago

Agreed that a note to the documentation would be helpful. Thanks for clarifying!