amices / mice

Multivariate Imputation by Chained Equations
https://amices.org/mice/
GNU General Public License v2.0
437 stars 107 forks source link

pool for gamlss objects #405

Closed dnzmarcio closed 3 years ago

dnzmarcio commented 3 years ago

The function pool ignores that there are different models for each parameter in gamlss (Generalized Additive Models for Location, Scale and Shape) objects such that it will average over the regression coefficients from different parameters that are modelled with the same covariates.

Libraries and data generation

# Libraries
library(gamlss)
library(mice)

# Data
group <- sample(c("A", "B"), size = 100, replace = TRUE)
biomarker <- rnorm(n = 100, mean = 0, sd = 1)
design.matrix.mu <- model.matrix(~ group + biomarker)
design.matrix.tau <- model.matrix(~ biomarker)
beta <- c(-1, 0.5, 1)
gamma <- c(1, 0.6)
sigma <- plogis(1)
mu <- plogis(design.matrix.mu%*%beta)
nu <- exp(design.matrix.tau%*%gamma)

outcome <- rBEINF(n = 100, mu = mu, sigma = sigma, nu = nu, tau = 0.1)
dataset <- data.frame(outcome, group, biomarker)
dataset[sample(1:100, size = 10), "biomarker"] <- NA

Fitting a gamlss model with formula for the mean (mu) and the probability of inflated zeros (nu)

fit <- gamlss(formula = outcome ~ group + biomarker,
              nu.formula = ~ biomarker,
              data = na.omit(dataset),
              family = "BEINF")
## GAMLSS-RS iteration 1: Global Deviance = -53.5503 
## GAMLSS-RS iteration 2: Global Deviance = -56.593 
## GAMLSS-RS iteration 3: Global Deviance = -56.6955 
## GAMLSS-RS iteration 4: Global Deviance = -56.7008 
## GAMLSS-RS iteration 5: Global Deviance = -56.7012
summary(fit)
## ******************************************************************
## Family:  c("BEINF", "Beta Inflated") 
## 
## Call:  gamlss(formula = outcome ~ group + biomarker, nu.formula = ~biomarker,  
##     family = "BEINF", data = na.omit(dataset)) 
## 
## Fitting method: RS() 
## 
## ------------------------------------------------------------------
## Mu link function:  logit
## Mu Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.4053     0.4058  -3.463 0.000846 ***
## groupB        1.1891     0.4763   2.497 0.014517 *  
## biomarker     1.0107     0.2567   3.937 0.000171 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## ------------------------------------------------------------------
## Sigma link function:  logit
## Sigma Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   0.6505     0.2025   3.213  0.00187 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## ------------------------------------------------------------------
## Nu link function:  log 
## Nu Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.8971     0.2430   3.692 0.000397 ***
## biomarker     0.4460     0.2642   1.688 0.095138 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## ------------------------------------------------------------------
## Tau link function:  log 
## Tau Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -2.6027     0.7328  -3.552 0.000634 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## ------------------------------------------------------------------
## No. of observations in the fit:  90 
## Degrees of Freedom for the fit:  7
##       Residual Deg. of Freedom:  83 
##                       at cycle:  5 
##  
## Global Deviance:     -56.70123 
##             AIC:     -42.70123 
##             SBC:     -25.20256 
## ******************************************************************

Multiple imputation

imp <- mice(m = 2, dataset)
## 
##  iter imp variable
##   1   1  biomarker
##   1   2  biomarker
##   2   1  biomarker
##   2   2  biomarker
##   3   1  biomarker
##   3   2  biomarker
##   4   1  biomarker
##   4   2  biomarker
##   5   1  biomarker
##   5   2  biomarker
fit <- with(data = imp, gamlss(formula = outcome ~ group + biomarker,
                               nu.formula = ~ biomarker,
                               family = "BEINF"))
## GAMLSS-RS iteration 1: Global Deviance = -37.6422 
## GAMLSS-RS iteration 2: Global Deviance = -41.2915 
## GAMLSS-RS iteration 3: Global Deviance = -41.4323 
## GAMLSS-RS iteration 4: Global Deviance = -41.4397 
## GAMLSS-RS iteration 5: Global Deviance = -41.4405 
## GAMLSS-RS iteration 1: Global Deviance = -33.0859 
## GAMLSS-RS iteration 2: Global Deviance = -37.5867 
## GAMLSS-RS iteration 3: Global Deviance = -37.7456 
## GAMLSS-RS iteration 4: Global Deviance = -37.7563 
## GAMLSS-RS iteration 5: Global Deviance = -37.7567

Issue: pool ignores that there are two regression models for mu and tau resulting in only one set of coefficients

summary(pool(fit))
##          term   estimate std.error  statistic        df    p.value
## 1 (Intercept) -0.5395432 1.5824961 -0.3409444  3.004534 0.75559497
## 2      groupB  0.9770695 0.4330088  2.2564658 88.762810 0.02649718
## 3   biomarker  0.6821122 0.3807480  1.7915055  7.769334 0.11209532

Session Info

devtools::session_info()
## - Session info ---------------------------------------------------------------
##  setting  value                       
##  version  R version 4.0.3 (2020-10-10)
##  os       Windows 10 x64              
##  system   x86_64, mingw32             
##  ui       RTerm                       
##  language (EN)                        
##  collate  English_United States.1252  
##  ctype    English_United States.1252  
##  tz       America/Los_Angeles         
##  date     2021-06-09                  
## 
## - Packages -------------------------------------------------------------------
##  package     * version date       lib source        
##  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.0.3)
##  backports     1.2.1   2020-12-09 [1] CRAN (R 4.0.3)
##  broom         0.7.5   2021-02-19 [1] CRAN (R 4.0.4)
##  callr         3.5.1   2020-10-13 [1] CRAN (R 4.0.3)
##  cli           2.4.0   2021-04-05 [1] CRAN (R 4.0.5)
##  crayon        1.4.1   2021-02-08 [1] CRAN (R 4.0.4)
##  DBI           1.1.0   2019-12-15 [1] CRAN (R 4.0.3)
##  desc          1.2.0   2018-05-01 [1] CRAN (R 4.0.3)
##  devtools      2.3.2   2020-09-18 [1] CRAN (R 4.0.3)
##  digest        0.6.27  2020-10-24 [1] CRAN (R 4.0.3)
##  dplyr         1.0.5   2021-03-05 [1] CRAN (R 4.0.4)
##  ellipsis      0.3.1   2020-05-15 [1] CRAN (R 4.0.4)
##  evaluate      0.14    2019-05-28 [1] CRAN (R 4.0.3)
##  fansi         0.4.2   2021-01-15 [1] CRAN (R 4.0.3)
##  fs            1.5.0   2020-07-31 [1] CRAN (R 4.0.3)
##  gamlss      * 5.2-0   2020-09-12 [1] CRAN (R 4.0.3)
##  gamlss.data * 5.1-4   2019-05-15 [1] CRAN (R 4.0.3)
##  gamlss.dist * 5.1-7   2020-07-13 [1] CRAN (R 4.0.3)
##  generics      0.1.0   2020-10-31 [1] CRAN (R 4.0.3)
##  glue          1.4.2   2020-08-27 [1] CRAN (R 4.0.3)
##  htmltools     0.5.0   2020-06-16 [1] CRAN (R 4.0.3)
##  knitr         1.31    2021-01-27 [1] CRAN (R 4.0.4)
##  lattice       0.20-41 2020-04-02 [2] CRAN (R 4.0.3)
##  lifecycle     1.0.0   2021-02-15 [1] CRAN (R 4.0.4)
##  magrittr      2.0.1   2020-11-17 [1] CRAN (R 4.0.4)
##  MASS        * 7.3-53  2020-09-09 [2] CRAN (R 4.0.3)
##  Matrix        1.2-18  2019-11-27 [2] CRAN (R 4.0.3)
##  memoise       1.1.0   2017-04-21 [1] CRAN (R 4.0.3)
##  mice        * 3.13.0  2021-01-27 [1] CRAN (R 4.0.5)
##  nlme        * 3.1-149 2020-08-23 [2] CRAN (R 4.0.3)
##  pillar        1.6.0   2021-04-13 [1] CRAN (R 4.0.5)
##  pkgbuild      1.1.0   2020-07-13 [1] CRAN (R 4.0.3)
##  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.0.3)
##  pkgload       1.1.0   2020-05-29 [1] CRAN (R 4.0.3)
##  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.0.3)
##  processx      3.4.4   2020-09-03 [1] CRAN (R 4.0.3)
##  ps            1.4.0   2020-10-07 [1] CRAN (R 4.0.3)
##  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.0.3)
##  R6            2.5.0   2020-10-28 [1] CRAN (R 4.0.3)
##  Rcpp          1.0.6   2021-01-15 [1] CRAN (R 4.0.4)
##  remotes       2.2.0   2020-07-21 [1] CRAN (R 4.0.3)
##  rlang         0.4.10  2020-12-30 [1] CRAN (R 4.0.3)
##  rmarkdown     2.5     2020-10-21 [1] CRAN (R 4.0.3)
##  rprojroot     2.0.2   2020-11-15 [1] CRAN (R 4.0.3)
##  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.0.3)
##  stringi       1.5.3   2020-09-09 [1] CRAN (R 4.0.3)
##  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.0.3)
##  survival      3.2-7   2020-09-28 [2] CRAN (R 4.0.3)
##  testthat      3.0.0   2020-10-31 [1] CRAN (R 4.0.3)
##  tibble        3.1.1   2021-04-18 [1] CRAN (R 4.0.3)
##  tidyr         1.1.3   2021-03-03 [1] CRAN (R 4.0.4)
##  tidyselect    1.1.0   2020-05-11 [1] CRAN (R 4.0.3)
##  usethis       1.6.3   2020-09-17 [1] CRAN (R 4.0.3)
##  utf8          1.2.1   2021-03-12 [1] CRAN (R 4.0.3)
##  vctrs         0.3.7   2021-03-29 [1] CRAN (R 4.0.5)
##  withr         2.4.2   2021-04-18 [1] CRAN (R 4.0.3)
##  xfun          0.22    2021-03-11 [1] CRAN (R 4.0.4)
##  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.0.3)
stefvanbuuren commented 3 years ago

Thanks for your note.

I am afraid I cannot solve this. gamlss fits a model for each moment of the distribution. From your mail I conclude that the tidy() function for gamlss objects only stores the model parameters for the model of the first moment.

I see two ways to go forward: