amices / mice

Multivariate Imputation by Chained Equations
https://amices.org/mice/
GNU General Public License v2.0
428 stars 107 forks source link

Does imputing interaction terms involving categorical variables still work? #530

Closed CWiederkehr closed 1 year ago

CWiederkehr commented 1 year ago

Passive Imputation of interactions between two continuous variables is working fine but are still dummy variables internally created in the setting of interaction terms involving categorical variables? And also the object of the command ini$pad doesn`t seem to exist. I copied and pasted the relevant code contained in “‘Mice‘: Multivariate Imputation by Chained Equations in R.”(2011) https://www.jstatsoft.org/article/view/v045i03:

library(mice)
#> Warning: package 'mice' was built under R version 4.1.3
#> 
#> Attaching package: 'mice'
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> The following objects are masked from 'package:base':
#> 
#>     cbind, rbind

## Interactions between two continuous variables
nhanes2.ext <- cbind(nhanes2, bmi.chl = NA)
ini <- mice(nhanes2.ext, max = 0, print = FALSE)
#> Warning: Number of logged events: 1
meth <- ini$meth
meth["bmi.chl"] <- "~I((bmi-25)*(chl-200))"
pred <- ini$pred
pred[c("bmi", "chl"), "bmi.chl"] <- 0
imp <- mice(nhanes2.ext, meth = meth, pred = pred, seed = 51600, print = FALSE)

# dummy variables can be accessed from 'imp$pad$data'
head(ini$pad$data, 3)
#> NULL

## Interactions involving categorical variables
nhanes2.ext <- cbind(nhanes2, age.1.bmi = NA, age.2.bmi = NA)
ini <- mice(nhanes2.ext, max = 0, print = FALSE)
#> Warning: Number of logged events: 2
meth <- ini$meth
meth["age.1.bmi"] <- "~I(age.1*(bmi-25))"
meth["age.2.bmi"] <- "~I(age.2*(bmi-25))"
pred <- ini$pred
pred[c("age", "bmi"), c("age.1.bmi", "age.2.bmi")] <- 0
imp <- mice(nhanes2.ext, meth = meth, pred = pred, maxit = 10)
#> 
#>  iter imp variable
#>   1   1  bmi  hyp  chl  age.1.bmi
#> Error in unique.default(c("AsIs", oldClass(x))): object 'age.1' not found

Created on 2023-01-17 with reprex v2.0.2

FannyArtaud commented 1 year ago

Hello, I have exactly the same question, do you have an answer please? Thank you

CWiederkehr commented 1 year ago

Hello FunnyArtraud,

unfortunatley, I didn't receive an answer to my previous inquiry and, as such, was unable to proceed as anticipated with the internally provided dummy coding. As consequence, i can´t tell if it actually works.

However, there's an alternative you might consider. While it is admittedly a bit more labor-intensive, you can try implementing the dummy coding manually. Despite the extra effort, it has proved to be quite effective based on my experiences. I've tested this method extensively, even going as far as to compare my findings to those outlined in a published paper. In this particular research paper, a similar passive approach was employed, reinforcing my confidence in this methodology.

Below i modified the example by incorparating dummy coding manually.

I hope this information helps!

Best regards, Christoph

library(mice)
#> Warning: Paket 'mice' wurde unter R Version 4.1.3 erstellt
#> 
#> Attache Paket: 'mice'
#> Das folgende Objekt ist maskiert 'package:stats':
#> 
#>     filter
#> Die folgenden Objekte sind maskiert von 'package:base':
#> 
#>     cbind, rbind

# create missingns for 'age'
nhanes2$age[sample(length(nhanes2$age), 3)] <- NA
str(nhanes2)
#> 'data.frame':    25 obs. of  4 variables:
#>  $ age: Factor w/ 3 levels "20-39","40-59",..: 1 2 1 3 NA 3 1 1 2 2 ...
#>  $ bmi: num  NA 22.7 NA NA 20.4 NA 22.5 30.1 22 NA ...
#>  $ hyp: Factor w/ 2 levels "no","yes": NA 1 1 NA 1 NA 1 1 1 NA ...
#>  $ chl: num  NA 187 187 NA 113 184 118 187 238 NA ...
# create help-vector
nhanes2$age_no_NA <- ifelse(is.na(nhanes2$age), "Missing", as.character(nhanes2$age))
nhanes2$age_no_NA <- as.factor(nhanes2$age_no_NA)
# Create the dummy matrix using the variable 'age_no_NA'
age_dummies <- model.matrix(~ age_no_NA - 1, data = nhanes2)
missing_vector <- as.numeric(nhanes2$age_no_NA == "Missing")
# Combine the dummy matrix and the dummy vector for missing values
age_dummies <- cbind(age_dummies, Missing = missing_vector)
# Replace the entries in the dummy matrix with NA based on the dummy vector for missing values
age_dummies <- apply(age_dummies, 2, function(x) ifelse(x == 0 & missing_vector == 1, NA, x))
age_dummies <- age_dummies[, c(1:2)] # leave out 3rd category as reference!
colnames(age_dummies) <- c("age_1", "age_2")
nhanes2 <- cbind(nhanes2[,-5], age_dummies) # drop `missing_vector`

# Create Data with interactions
nhanes2 <- cbind(nhanes2, age1.bmi = nhanes2$age_1*nhanes2$bmi, age2.bmi = nhanes2$age_2*nhanes2$bmi)

ini <- mice(nhanes2, max = 0, print = FALSE)

meth <- ini$meth
# specify the passive approach
meth[c("age_1", "age_2", "age1.bmi", "age2.bmi")] <- c("~I(as.integer(model.matrix(~ age - 1)[,1]))",
                                                   "~I(as.integer(model.matrix(~ age - 1)[,2]))",
                                                   "~I(age_1*bmi)", 
                                                   "~I(age_2*bmi)")

pred <- ini$pred
# delete unnecesary predictors
pred[5:8,] <- 0
pred[c("age", "bmi"), c("age_1", "age_2", "age1.bmi", "age2.bmi")] <- 0
pred[c("hyp", "chl"), c("age_1", "age_2")] <- 0
imp <- mice(nhanes2, meth = meth, pred = pred, seed = 51600, print = FALSE)
# check result
full_nhanes2 <- mice::complete(imp, 1) ; str(full_nhanes2)
#> 'data.frame':    25 obs. of  8 variables:
#>  $ age     : Factor w/ 3 levels "20-39","40-59",..: 1 2 1 3 1 3 1 1 2 2 ...
#>  $ bmi     : num  22.5 22.7 30.1 22.5 20.4 22.7 22.5 30.1 22 22.7 ...
#>  $ hyp     : Factor w/ 2 levels "no","yes": 1 1 1 2 1 2 1 1 1 1 ...
#>  $ chl     : num  131 187 187 284 113 184 118 187 238 229 ...
#>  $ age_1   : num  1 0 1 0 1 0 1 1 0 0 ...
#>  $ age_2   : num  0 1 0 0 0 0 0 0 1 1 ...
#>  $ age1.bmi: num  22.5 0 30.1 0 20.4 0 22.5 30.1 0 0 ...
#>  $ age2.bmi: num  0 22.7 0 0 0 0 0 0 22 22.7 ...

Created on 2023-06-17 with reprex v2.0.2

FannyArtaud commented 1 year ago

Thank you very much. I finally succeeded using the formulas option.