amices / mice

Multivariate Imputation by Chained Equations
https://amices.org/mice/
GNU General Public License v2.0
424 stars 106 forks source link

How should mice behave when variables are not specified in the model #583

Open stefvanbuuren opened 10 months ago

stefvanbuuren commented 10 months ago

test-blocks.R contains a specification of the mice setup with two non-standard features.

The current policy is not very satisfying. Currently, where[, "hyp"] is set to FALSE, so hyp is not imputed. However, it is still a predictor for blocks B1, bmi and age, thus leading to missing data propagation.

Using c2da03c:

library(mice)   # branch support_blocks 
#> 
#> Attaching package: 'mice'
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> The following objects are masked from 'package:base':
#> 
#>     cbind, rbind
imp <- mice(nhanes, blocks = make.blocks(list(c("bmi", "chl"), "bmi", "age")), m = 1, print = FALSE)

head(complete(imp))
#>   age  bmi hyp chl
#> 1   1   NA  NA  NA
#> 2   2 22.7   1 187
#> 3   1 27.2   1 187
#> 4   3   NA  NA  NA
#> 5   1 20.4   1 113
#> 6   3   NA  NA 184
imp$blocks
#> $B1
#> [1] "bmi" "chl"
#> 
#> $bmi
#> [1] "bmi"
#> 
#> $age
#> [1] "age"
#> 
#> attr(,"calltype")
#>        B1       bmi       age 
#> "formula" "formula" "formula"
imp$formulas
#> $B1
#> bmi + chl ~ age + hyp
#> <environment: 0x11e6e1750>
#> 
#> $bmi
#> bmi ~ age + hyp + chl
#> <environment: 0x11e6e1750>
#> 
#> $age
#> age ~ bmi + hyp + chl
#> <environment: 0x11e6e1750>
head(imp$where)
#>     age   bmi   hyp   chl
#> 1 FALSE  TRUE FALSE  TRUE
#> 2 FALSE FALSE FALSE FALSE
#> 3 FALSE  TRUE FALSE FALSE
#> 4 FALSE  TRUE FALSE  TRUE
#> 5 FALSE FALSE FALSE FALSE
#> 6 FALSE  TRUE FALSE FALSE
imp$method
#>    B1   bmi   age 
#> "pmm" "pmm"    ""
imp$predictorMatrix
#>     age bmi hyp chl
#> age   0   0   0   0
#> bmi   1   0   1   1
#> hyp   1   1   0   1
#> chl   1   1   1   0

Created on 2023-09-13 with reprex v2.0.2

A better policy might be inactivating any unmentioned variable j by

1) set method[j] to "" (we can always do that since j is not mentioned in the model) 2) set predictorMatrix[, j] to 0 (take j out as predictor) 3) leave predictorMatrix[j, ] untouched (so we can still which variables it would require to imputed) 4) leave where[, j] untouched

As a result, j is not imputed and is not a predictor anywhere. The policy might stimulate starting small (with a few variables, and gradually build up). Does this seem a good approach? Any downsides to it?

stefvanbuuren commented 9 months ago

After some discussions, I suggest the following NA-PROPAGATION policy:

Note that these options are not yet implemented.