amices / mice

Multivariate Imputation by Chained Equations
https://amices.org/mice/
GNU General Public License v2.0
428 stars 107 forks source link

[Question] Is there a way to obtain marginal effects from glm after mice? #537

Closed Generalized closed 1 year ago

Generalized commented 1 year ago

I would like to obtain Wald's percentage-point change from the logistic regression along with its confidence interval. For ordinary glm I can use the "margins" package trivially: margins::margins_summary(fit)

I'm wondering how to do this in the context of multiple imputation? I'm not sure if there are pooling rules for it, but I think I would be fine to run it over the pooled glm summary, however I don't know how to combine this with mice:

nhanes$hyp <- as.factor(nhanes$hyp)
imp <- mice::mice(nhanes, m = 20, method = "pmm",  maxit = 10, seed = 1000, print = FALSE)
fit <- with(imp,  glm(hyp ~ bmi, family = binomial(link = 'logit')))
summary(pool(fit), conf.int = TRUE, exponentiate = TRUE)

# This naive approach doesn't work. Is there any way to make it working? 
margins::margins_summary(pool(fit))

Output of margins over an ordinary glm looks like:

> margins::margins_summary(m)
 factor    AME     SE      z      p   lower  upper
 trtsoc 0.3065 0.1570 1.9522 0.0509 -0.0012 0.6143

where AME here is the %point-change between two % of successes.

thomvolker commented 1 year ago

The marginaleffects R-package has functionality to compute the marginal effects after performing multiple imputation with mice (see the following link: https://vincentarelbundock.github.io/marginaleffects/articles/multiple_imputation.html).

If you don't want to use the marginaleffects package, I think it should be pretty straightforward to implement a similar strategy with the margins package. However, you would have to compute marginal effects over all regression models in your fit objects, and use the proper pooling rules to pool the marginal effects per imputed set into a single pooled marginal effect estimate for the parameters of interest.

I hope this answers your question.

Generalized commented 1 year ago

Thank you very much for so quick response! That's excellent news, it made my day!

EDIT: OK, got it, for the GLM it's z statistic and the warning is about that.

library(mice)
library(marginaleffects)

nhanes$hyp <- as.factor(nhanes$hyp)
imp              <- mice::mice(nhanes, m = 20, method = "pmm",  maxit = 10, seed = 1000, print = FALSE)
fit                 <- with(imp,  glm(hyp ~ bmi, family = binomial(link = 'logit')))

summary(pool(fit), conf.int = TRUE, exponentiate = TRUE)

         term  estimate std.error  statistic       df   p.value        2.5 %     97.5 %
1 (Intercept) 0.1649319 3.6453248 -0.4943929 17.52815 0.6271678 7.670295e-05 354.647832
2         bmi 1.0176852 0.1347845  0.1300643 17.44654 0.8980035 7.662197e-01   1.351679

fit_logistic <- function(dat) {
  mod <- glm(hyp ~ bmi, family = binomial(link = 'logit'), data=dat)
  out <- avg_slopes(mod, newdata = dat)
  return(out)
}

dat_mice             <- complete(imp, "all")
mod_imputation <- lapply(dat_mice, fit_logistic)
summary(pool(mod_imputation))

  term    estimate  std.error statistic       df   p.value
1  bmi 0.002957066 0.02207825 0.1339357 575.2135 0.8935003

with a warning:

Warning message:
In get.dfcom(object, dfcom) : Infinite sample size assumed.

Thank you!