bbolker / broom.mixed

tidy methods for mixed models in R
227 stars 23 forks source link

pooling issue for multiple imputed Zero-inflated model from glmmTMB #85

Closed tingski5 closed 4 years ago

tingski5 commented 4 years ago

Hello,

I'm trying to pool the results of a multiple imputed zero-inflated poisson model run using glmmTMB. I get the following error message:

Error in data.frame(pooled[, -1L], row.names = pooled$term) : duplicate row.names: (Intercept)

There are two intercepts because it's a zero inflated model with one intercept for the poisson model and the other for the logistic model. I guess it doesn't recognize this aspect and wants all different names (even for the two intercepts). Is there a work around to get it to pool?

Thanks!

bbolker commented 4 years ago

Could we have a reproducible example, please? (I'm guessing that you're using mice::pool() ... )

Also, on second reading: is this really a broom.mixed question, or is it a glmmTMB question? (The latter seems more sensible to me.)

tingski5 commented 4 years ago

I don't know.

So I can't even get a summary of my multiple imputed zero-inflated model by using the zeroinfl() from the pscl package nor can I pool it. It has to be glmmTMB to at least get me a summary. I just can't pool.

tingski5 commented 4 years ago
library(glmmTMB)
library(mice)
library(broom.mixed)

zinb <- read.csv("https://stats.idre.ucla.edu/stat/data/fish.csv")
zinb <- within(zinb, {
  nofish <- factor(nofish)
  livebait <- factor(livebait)
  camper <- factor(camper)
})

zinb <- ampute(data=zinb,prop = 0.5,mech="MCAR")$amp

imp <- mice(data=zinb,m=5,method = "cart")

mod <- with(imp,glmmTMB(count ~ child + camper, zi=~persons,
        family=poisson))
summary(mod)
pool(mod)
tingski5 commented 4 years ago

Maybe there's a work around. Where in the glmmTMB model object are coefficient names stored?

Like in the pscl package, with zeroinfl():

mod <- zeroinfl(count~camper+child|persons,dist = "poisson", EM=TRUE,data=zinb) mod$coefficients $count (Intercept) camper1 child 1.597885 0.834031 -1.042864 $zero (Intercept) persons 1.2975060 -0.5643885

I've searched for a while on how to extract this from the glmmTMB object, but cannot find it. I don't want a function to grab them out like fixef or ranef, but just wondering if they're stored somewhere inside the object.

Thanks!

bbolker commented 4 years ago

What do you mean by "coefficient names"?

Here's how the fixef.glmmTMB method extracts the coefficient vectors and names them:

glmmTMB:::fixef.glmmTMB
function (object, ...) 
{
    pl <- object$obj$env$parList(object$fit$par, object$fit$parfull)
    structure(list(cond = setNames(pl$beta, colnames(getME(object, 
        "X"))), zi = setNames(pl$betazi, colnames(getME(object, 
        "Xzi"))), disp = setNames(pl$betad, colnames(getME(object, 
        "Xd")))), class = "fixef.glmmTMB")
}

So if you want the names of the conditional, zero-inflated, or dispersion parameter vectors, you would apply colnames() to the model matrix extracted by using getME(<fitted_model>,...) with "X", "Xzi", or "Xd" respectively

tingski5 commented 4 years ago

Ok. But why I was wanting to find them within the model object, is so that I could rename them, again within the object. Thus causing the original issue from my first post to go away because of two duplicate row names.

Example:

> mod <- zeroinfl(count~camper+child|persons,dist = "poisson", EM=TRUE,data=zinb)
> names(mod$coefficients$zero) <- c("(Intercept2)","persons")
> summary(mod)

Call:
zeroinfl(formula = count ~ camper + child | persons, data = zinb, dist = "poisson", 
    EM = TRUE)

Pearson residuals:
    Min      1Q  Median      3Q     Max 
-1.2370 -0.7540 -0.6080 -0.1921 24.0849 

Count model coefficients (poisson with log link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.59788    0.08554  18.680   <2e-16 ***
camper1      0.83403    0.09363   8.908   <2e-16 ***
child       -1.04286    0.09999 -10.430   <2e-16 ***

Zero-inflation model coefficients (binomial with logit link):
             Estimate Std. Error z value Pr(>|z|)    
(Intercept2)   1.2975     0.3739   3.471 0.000519 ***
persons       -0.5644     0.1630  -3.463 0.000534 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Can I rename the coefficients within glmmTMB?

Thanks!

bbolker commented 4 years ago

I think this works. This is really a question for the mice package: https://github.com/stefvanbuuren/mice/issues/219

hackfits <- function(x) {
    colnames(x$obj$env$data$Xzi) <-
        paste0("zi_",colnames(x$obj$env$data$Xzi))
    return(x)
}
mod$analyses <- lapply(mod$analyses,hackfits)

I'm going to close this if that hack works for you. Arguably there could be an option and/or behaviour change that would lead to disambiguated parameter names (this is an issue with the dotwhisker package as well as for mice), but it would definitely need more thought.

tingski5 commented 4 years ago

Yes! This worked great! Thanks so much. Yeah, maybe an option to specify that this is a zero-inflated model with two intercepts so that the pooling doesn't freak out. Otherwise, we have to do this hard-coding technique.

Thanks again!