Closed donaldRwilliams closed 2 years ago
Dear Donald,
By default, mice
first replaces each NA
in the data with a random draw of observed values from these variables (the so-called starting values). This is done so that the models used to generate imputations do not fail (which is the case if there are NAs
in the predictors). If a row consists exclusively of NAs
, all of these will be replaced with a randomly sampled value from each corresponding variable. Subsequently, these values are updated with every iteration, equivalently to what happens when only a subset of an observation's values is missing.
If you want to prevent this from happening, you can specify the where
argument in mice
, such as in the following example.
library(mice) # load mice
#>
#> Attaching package: 'mice'
#> The following object is masked from 'package:stats':
#>
#> filter
#> The following objects are masked from 'package:base':
#>
#> cbind, rbind
df <- nhanes # make df
df[2, ] <- NA # set second row to NA
where <- make.where(df) # specify where matrix
head(where) # by default, the second row is imputed
#> age bmi hyp chl
#> 1 FALSE TRUE TRUE TRUE
#> 2 TRUE TRUE TRUE TRUE
#> 3 FALSE TRUE FALSE FALSE
#> 4 FALSE TRUE TRUE TRUE
#> 5 FALSE FALSE FALSE FALSE
#> 6 FALSE TRUE TRUE FALSE
where[rowSums(where) == ncol(where), ] <- FALSE # change which cells are imputed
head(where) # now, the second row won't be imputed
#> age bmi hyp chl
#> 1 FALSE TRUE TRUE TRUE
#> 2 FALSE FALSE FALSE FALSE
#> 3 FALSE TRUE FALSE FALSE
#> 4 FALSE TRUE TRUE TRUE
#> 5 FALSE FALSE FALSE FALSE
#> 6 FALSE TRUE TRUE FALSE
imp <- mice(df, m = 1, maxit = 1, where = where)
#>
#> iter imp variable
#> 1 1 bmi hyp chl
head(complete(imp)) # second row is now not imputed
#> age bmi hyp chl
#> 1 1 35.3 1 218
#> 2 NA NA NA NA
#> 3 1 30.1 1 187
#> 4 3 27.4 2 204
#> 5 1 20.4 1 113
#> 6 3 22.5 1 184
Created on 2022-09-03 by the reprex package (v2.0.1)
I hope this helps, but let us know if you have any further questions or concerns.
Best, Thom
Thanks Tom for answering.
Hi, I am trying to wrap my head around what
mice
is doing when an entire row hasNA
s, but the values are imputed anyhowHere is an example
which returns
1 2 1 35.3 1 199
.I am curious how the imputation is being done, and how I can stop it from imputing for those rows.
Thanks !