amices / mice

Multivariate Imputation by Chained Equations
https://amices.org/mice/
GNU General Public License v2.0
447 stars 108 forks source link

Change reference level in logistic regression following MI using MICE #618

Closed franzepsy closed 9 months ago

franzepsy commented 9 months ago

I have run multiple imputation for missing values using the ‘mice’ package, fitted binary logistic regressions in each of the imputed datasets, and pooled the results.

Analyses in the imputed datasets

rutteremoIMP <- with(data=IMPRutterdata3, exp=glm(RRPSPCemot_42m_low1high2 ~ Group_num, family = binomial))

Pooling results using Rubin's Rules, GETTING OR (ESTIMATE), 95CI AND P

summary(pool(rutteremoIMP), conf.int = TRUE, exponentiate = TRUE)

The problem is that I would like to also change the reference level of my factor (now set to group 1) and fit the logistic regression model comparing, for example, groups 2 vs 3, but I don’t understand how to do that. In ’normal’ logistic regressions (without imputations), I use the relevel function, but it doesn’t seem to work, maybe because the code to fit the logistic regression on the imputed datasets uses a mids object. Is there a way to change the reference level before fitting a logistic regression model on imputed datasets?

thomvolker commented 9 months ago

I don't fully understand the problem, in a small test (see below) everything works fine. If this doesn't work out for you, could you then please provide a reprex.

library(mice)
#> 
#> Attaching package: 'mice'
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> The following objects are masked from 'package:base':
#> 
#>     cbind, rbind
set.seed(123)

y <- rbinom(100, 1, 0.5) |> as.factor()
x1 <- rnorm(100)
x2 <- sample(c("A", "B", "C"), 100, replace = TRUE) |> factor()

x1[1] <- NA

dat <- data.frame(y = y, x1 = x1, x2 = x2)

imp <- mice(dat)
#> 
#>  iter imp variable
#>   1   1  x1
#>   1   2  x1
#>   1   3  x1
#>   1   4  x1
#>   1   5  x1
#>   2   1  x1
#>   2   2  x1
#>   2   3  x1
#>   2   4  x1
#>   2   5  x1
#>   3   1  x1
#>   3   2  x1
#>   3   3  x1
#>   3   4  x1
#>   3   5  x1
#>   4   1  x1
#>   4   2  x1
#>   4   3  x1
#>   4   4  x1
#>   4   5  x1
#>   5   1  x1
#>   5   2  x1
#>   5   3  x1
#>   5   4  x1
#>   5   5  x1

with(imp, glm(y ~ x1 + x2, family = binomial))
#> call :
#> with.mids(data = imp, expr = glm(y ~ x1 + x2, family = binomial))
#> 
#> call1 :
#> mice(data = dat)
#> 
#> nmis :
#>  y x1 x2 
#>  0  1  0 
#> 
#> analyses :
#> [[1]]
#> 
#> Call:  glm(formula = y ~ x1 + x2, family = binomial)
#> 
#> Coefficients:
#> (Intercept)           x1          x2B          x2C  
#>    -0.01232     -0.33146      0.17677     -0.71100  
#> 
#> Degrees of Freedom: 99 Total (i.e. Null);  96 Residual
#> Null Deviance:       138.3 
#> Residual Deviance: 132.7     AIC: 140.7
#> 
#> [[2]]
#> 
#> Call:  glm(formula = y ~ x1 + x2, family = binomial)
#> 
#> Coefficients:
#> (Intercept)           x1          x2B          x2C  
#>    -0.01633     -0.29055      0.18136     -0.70122  
#> 
#> Degrees of Freedom: 99 Total (i.e. Null);  96 Residual
#> Null Deviance:       138.3 
#> Residual Deviance: 133.2     AIC: 141.2
#> 
#> [[3]]
#> 
#> Call:  glm(formula = y ~ x1 + x2, family = binomial)
#> 
#> Coefficients:
#> (Intercept)           x1          x2B          x2C  
#>   -0.009601    -0.359768     0.180021    -0.718030  
#> 
#> Degrees of Freedom: 99 Total (i.e. Null);  96 Residual
#> Null Deviance:       138.3 
#> Residual Deviance: 132.3     AIC: 140.3
#> 
#> [[4]]
#> 
#> Call:  glm(formula = y ~ x1 + x2, family = binomial)
#> 
#> Coefficients:
#> (Intercept)           x1          x2B          x2C  
#>   -0.009146    -0.364538     0.181703    -0.719235  
#> 
#> Degrees of Freedom: 99 Total (i.e. Null);  96 Residual
#> Null Deviance:       138.3 
#> Residual Deviance: 132.2     AIC: 140.2
#> 
#> [[5]]
#> 
#> Call:  glm(formula = y ~ x1 + x2, family = binomial)
#> 
#> Coefficients:
#> (Intercept)           x1          x2B          x2C  
#>   -0.009601    -0.359768     0.180021    -0.718030  
#> 
#> Degrees of Freedom: 99 Total (i.e. Null);  96 Residual
#> Null Deviance:       138.3 
#> Residual Deviance: 132.3     AIC: 140.3
with(imp, glm(y ~ x1 + relevel(x2, ref = "C"), family = binomial))
#> call :
#> with.mids(data = imp, expr = glm(y ~ x1 + relevel(x2, ref = "C"), 
#>     family = binomial))
#> 
#> call1 :
#> mice(data = dat)
#> 
#> nmis :
#>  y x1 x2 
#>  0  1  0 
#> 
#> analyses :
#> [[1]]
#> 
#> Call:  glm(formula = y ~ x1 + relevel(x2, ref = "C"), family = binomial)
#> 
#> Coefficients:
#>             (Intercept)                       x1  relevel(x2, ref = "C")A  
#>                 -0.7233                  -0.3315                   0.7110  
#> relevel(x2, ref = "C")B  
#>                  0.8878  
#> 
#> Degrees of Freedom: 99 Total (i.e. Null);  96 Residual
#> Null Deviance:       138.3 
#> Residual Deviance: 132.7     AIC: 140.7
#> 
#> [[2]]
#> 
#> Call:  glm(formula = y ~ x1 + relevel(x2, ref = "C"), family = binomial)
#> 
#> Coefficients:
#>             (Intercept)                       x1  relevel(x2, ref = "C")A  
#>                 -0.7175                  -0.2905                   0.7012  
#> relevel(x2, ref = "C")B  
#>                  0.8826  
#> 
#> Degrees of Freedom: 99 Total (i.e. Null);  96 Residual
#> Null Deviance:       138.3 
#> Residual Deviance: 133.2     AIC: 141.2
#> 
#> [[3]]
#> 
#> Call:  glm(formula = y ~ x1 + relevel(x2, ref = "C"), family = binomial)
#> 
#> Coefficients:
#>             (Intercept)                       x1  relevel(x2, ref = "C")A  
#>                 -0.7276                  -0.3598                   0.7180  
#> relevel(x2, ref = "C")B  
#>                  0.8981  
#> 
#> Degrees of Freedom: 99 Total (i.e. Null);  96 Residual
#> Null Deviance:       138.3 
#> Residual Deviance: 132.3     AIC: 140.3
#> 
#> [[4]]
#> 
#> Call:  glm(formula = y ~ x1 + relevel(x2, ref = "C"), family = binomial)
#> 
#> Coefficients:
#>             (Intercept)                       x1  relevel(x2, ref = "C")A  
#>                 -0.7284                  -0.3645                   0.7192  
#> relevel(x2, ref = "C")B  
#>                  0.9009  
#> 
#> Degrees of Freedom: 99 Total (i.e. Null);  96 Residual
#> Null Deviance:       138.3 
#> Residual Deviance: 132.2     AIC: 140.2
#> 
#> [[5]]
#> 
#> Call:  glm(formula = y ~ x1 + relevel(x2, ref = "C"), family = binomial)
#> 
#> Coefficients:
#>             (Intercept)                       x1  relevel(x2, ref = "C")A  
#>                 -0.7276                  -0.3598                   0.7180  
#> relevel(x2, ref = "C")B  
#>                  0.8981  
#> 
#> Degrees of Freedom: 99 Total (i.e. Null);  96 Residual
#> Null Deviance:       138.3 
#> Residual Deviance: 132.3     AIC: 140.3

Created on 2024-01-30 with reprex v2.0.2

franzepsy commented 9 months ago

Thank you for your help! It does work actually, there was a small mistake in my code.