Wrong help documentation and inconsistent parameter naming related to `family = Gamma` and `ziGamma`

DrJerryTAO commented 7 months ago

Hi @bbolker, I compared glm and glmmTMB with family = Gamma and found that

glmmTMB reports same dispersion as MASS::gamma.dispersion(glm()) which appears smaller than glm(). This makes standard errors smaller in glmmTMB and larger in glm(). This is not documented in glmmTMB and may make first users wonder why the results can be so different from glm.
glmmTMB allows variable dispersion associated with predictors, unlike glm() that enforces constant dispersion. This is not emphasized in glmmTMB help, just like Gaussian family. Mentioning the extended flexibility in glmmTMB over base family in family_glmmTMB should be helpful.
According to ?sigma.glmmTMB, sigma = (1/sqrt(shape)) for Gamma, which is tested true. However, it lacks important explanation that sigma = sqrt(dispersion). Also, there is no entry for ziGamma, where sigma = (1/sqrt(shape)) = sqrt(dispersion) also stands for the conditional+dispersion components according to my test example.
predict(type = 'disp') generates sigma = sqrt(dispersion), not dispersion for both Gamma and ziGamma. This will be very confusing and cause many innocent mistakes. A simple fix is to square these predicted values for the predicted dispersion parameter.
glmmTMB dispersion model actually fits log(shape) = log(1/dispersion) = b x for both Gamma and ziGamma. This is not documented anywhere and may cause users to misinterpret the model coefficients. A simple fix is to flip the signs of all estimates, so the dispersion model component will fit log(dispersion) = b x. In this way, a larger linear predictor of the dispersion model is associated with a larger dispersion parameter, a larger scale, and a larger variance.
In ?ziGamma, details for Gamma wrongly say that phi = dispersion = shape, and var = mu phi. The correct formula under current parameterization should be phi = exp(eta) = shape = 1/dispersion, and var = mu^2 / phi. If coefficient signs in the dispersion model are flipped as suggested above, it should be phi = exp(eta) = 1/shape = dispersion, and var = mu^2 phi. The same holds for ziGamma conditional+dispersion components.
Some useful relationships in gamma distribution worth adding in help: Mean = shape scale. Variance = mean scale = shape scale^2 = dispersion mean^2 = mean^2/shape. Shape = 1/dispersion = mean^2 / var. Scale = 1/rate = var/mean = mean/shape = mean * dispersion. Dispersion = 1/shape = sigma^2.

Test scripts I used

Data <- read.csv(
  file = 'http://static.lib.virginia.edu/statlab/materials/data/alb_homes.csv')
library(glmmTMB)
library(MASS)
mean(Data$totalvalue) # 426973.7
var(Data$totalvalue) # 134443725982
sd(Data$totalvalue) # 366665.7

# Using glm()
summary(Model <- glm(
  totalvalue ~ 1, data = Data, family = gaussian(link = "identity")))
sigma(Model) # 366665.7
"            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   426974       6667   64.05   <2e-16 ***
(Dispersion parameter for gaussian family taken to be 134443725982)
    Null deviance: 4.0656e+14  on 3024  degrees of freedom
Residual deviance: 4.0656e+14  on 3024  degrees of freedom
AIC: 86101
in glm(gaussian), dispersion = var(error)
sigma() reports sqrt(dispersion)"
summary(Model <- glm(
  totalvalue ~ 1, data = Data, family = Gamma(link = "identity")))
sigma(Model) # 0.6315124
sqrt(deviance(Model)/df.residual(Model)) # 0.6315124
"            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   426974       6667   64.05   <2e-16 ***
(Dispersion parameter for Gamma family taken to be 0.7374599)
    Null deviance: 1206  on 3024  degrees of freedom
Residual deviance: 1206  on 3024  degrees of freedom
AIC: 83236
In glm(Gamma) default link is inverse. 
mean = shape * scale, variance = mean * scale, 
dispersion = 1/shape

Based on mean and variance estimates, 
shape = 426973.7^2/134443725982 = 1.356006
scale = 134443725982/426973.7 = 314875.9
dispersion = 1/1.356006 = 0.7374599 matches the reported

sigma = sqrt(deviance/df) = sqrt(1206/3024) = 0.6315137"
summary(Model <- glm(
  totalvalue ~ 1, data = Data, family = Gamma(link = "log")))
sigma(Model) # 0.6315124
"            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 12.96448    0.01561   830.3   <2e-16 ***
(Dispersion parameter for Gamma family taken to be 0.7374599)
    Null deviance: 1206  on 3024  degrees of freedom
Residual deviance: 1206  on 3024  degrees of freedom
AIC: 83236

changing link does not change dispersion parameter
mean = exp(12.96448) = 426974.7
shape = 1/0.7374599 = 1.356006
scale = mean/shape = exp(12.96448) * 0.7374599 = 314876.7
var = exp(12.96448)^2 * 0.7374599 = 134444382152"
gamma.shape(Model) # Alpha: 2.66319291. SE: 0.06465039
gamma.dispersion(Model) # 0.3754891 this is 1/gamma.shape
MASS:::gamma.shape.glm()
(6 + 2 * sigma(Model)^2)/(sigma(Model)^2 * (6 + sigma(Model)^2)) # 2.663752
summary(Model, dispersion = gamma.dispersion(Model))
"A glm fit for a Gamma family correctly calculates the maximum likelihood 
estimate of the mean parameters but provides only a crude estimate of the 
dispersion parameter. 
This function takes the results of the glm fit and solves the maximum 
likelihood equation for the reciprocal of the dispersion parameter, 
which is usually called the shape (or exponent) parameter.

If dispersion reported by glm() is not trustworthy and use MASS instead
shape = 1/disperson = 1/0.3754891 = 2.66319291.
scale = mean/shape = exp(12.96448) * 0.3754891 = 160324.3
var = exp(12.96448)^2 * 0.3754891 = 68454434003, only half of var(y)
Thus, glm() gives dispersion based on var(y) and mean(y)
MASS::gamma.shape is based on the implied gamma distribution where
dispersion = 1/shape should be close to sigma^2"

# Using glmmTMB
summary(Model <- glmmTMB(
  totalvalue ~ 1, data = Data, family = gaussian(link = "identity")))
sigma(Model) # 366605
"     AIC      BIC   logLik deviance df.resid 
 86101.4  86113.5 -43048.7  86097.4     3023 
Dispersion estimate for gaussian family (sigma^2): 1.34e+11 
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   426973       6666   64.06   <2e-16 ***
dispersion = var(error)
sigma = sqrt(var(error) = sqrt(dispersion))"
summary(Model <- glmmTMB(
  totalvalue ~ 1, dispformula = ~ fp, 
  data = Data, family = gaussian(link = "identity")))
var(Data$totalvalue[Data$fp == 0]) # 36000707051
var(Data$totalvalue[Data$fp == 1]) # 1.56013e+11
"     AIC      BIC   logLik deviance df.resid 
 85797.2  85815.2 -42895.6  85791.2     3022 
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   351638       6997   50.26   <2e-16 ***
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) 24.53664    0.05769   425.3   <2e-16 ***
fp           1.35356    0.07088    19.1   <2e-16 ***

exp(24.53664) = 45303041117; 45303041117/36000707051 = 1.258393
exp(24.53664 + 1.35356) = 175376337175; 175376337175/1.56013e+11 = 1.124114"
summary(Model <- glmmTMB(
  totalvalue ~ 1, data = Data, family = Gamma(link = "identity")))
sigma(Model) # 0.6127717
sigma(Model)^2 # 0.3754891
unique(predict(Model, type = "disp")) # 0.6127716
"     AIC      BIC   logLik deviance df.resid 
 83229.9  83241.9 -41612.9  83225.9     3023 
Dispersion estimate for Gamma family (sigma^2): 0.375 
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   426974       4757   89.76   <2e-16 ***
Dispersion = sigma^2 = 0.3754891 matches MASS::gamma.dispersion(glm())
?sigma.glmmTMB sigma = (1/sqrt(shape)) = sqrt(dispersion). This is correct

predict(type = 'disp') generates sigma = sqrt(dispersion), not dispersion"
summary(Model <- glmmTMB(
  totalvalue ~ 1, data = Data, family = Gamma(link = "log")))
"     AIC      BIC   logLik deviance df.resid 
 83229.9  83241.9 -41612.9  83225.9     3023 
Dispersion estimate for Gamma family (sigma^2): 0.375 
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) 12.96448    0.01114    1164   <2e-16 ***

In glmmTMB, link(mu) does not change dispersion estimate. 
shape = 1/dispersion = 1/0.3754891 = 2.663193"
summary(Model <- glmmTMB(
  totalvalue ~ 1, dispformula = ~ 0 + one, 
  data = Data %>% mutate(one = 1), family = Gamma(link = "identity")))
"     AIC      BIC   logLik deviance df.resid 
 83229.9  83241.9 -41612.9  83225.9     3023 
Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   426974       4757   89.76   <2e-16 ***
Dispersion model:
    Estimate Std. Error z value Pr(>|z|)    
one  0.97953    0.02428   40.35   <2e-16 ***

exp(0.97953) = 2.663204 matches MASS::gamma.shape(glm())
glmmTMB dispersion model actually fits log(shape) = log(1/dispersion) = b * x

?ziGamma, details wrongly says dispersion = shape = phi, and var = mu * phi
It should be phi = exp(eta) = shape, and var = mu^2 / phi"
summary(Model <- glmmTMB(
  totalvalue ~ 1, dispformula = ~ bedroom, 
  data = Data, family = Gamma(link = "identity")))
"     AIC      BIC   logLik deviance df.resid 
 83184.5  83202.5 -41589.2  83178.5     3022 
Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   407112       5216   78.06   <2e-16 ***
Dispersion model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.54687    0.08360  18.502  < 2e-16 ***
bedroom     -0.16390    0.02365  -6.931 4.17e-12 ***

dispformula allows predictors in Gamma"

# ziGamma
summary(Model <- glmmTMB(
  bedroom ~ 1, ziformula = ~ 1, 
  data = Data, family = ziGamma(link = "log")))
"     AIC      BIC   logLik deviance df.resid 
  7795.8   7813.8  -3894.9   7789.8     3022 
Dispersion estimate for Gamma family (sigma^2): 0.0685 
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) 1.217438   0.004764   255.5   <2e-16 ***
Zero-inflation model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -5.933      0.354  -16.76   <2e-16 ***"
summary(Model <- glmmTMB(
  bedroom ~ 1, ziformula = ~ 1, dispformula = ~ 0 + one, 
  data = Data %>% mutate(one = 1), family = ziGamma(link = "log")))
"     AIC      BIC   logLik deviance df.resid 
  7795.8   7813.8  -3894.9   7789.8     3022 
Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) 1.217438   0.004764   255.5   <2e-16 ***
Zero-inflation model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -5.933      0.354  -16.76   <2e-16 ***
Dispersion model:
    Estimate Std. Error z value Pr(>|z|)    
one  2.68112    0.02546   105.3   <2e-16 ***
exp(2.68112) = 14.60144 = 1/0.0685"
sigma(Model) # 0.2616992
sigma(Model)^2
unique(predict(Model, type = "disp")) # 0.2616992
unique(predict(Model, type = "response")) # 3.369587
unique(predict(Model, type = "conditional")) # 3.378522
unique(predict(Model, type = "zprob")) # 0.002644616
unique(predict(Model, type = "conditional")) * (
  1 - unique(predict(Model, type = "zprob"))) # 3.369587
"0.2616992^2 = 0.06848647
mean = exp(1.217438) = 3.378521
var = dispersion * mean^2 = 0.06848647 * exp(1.217438)^2 = 0.7817322
dispersion = sigma^2 = 0.06848647
shape = 1/dispersion = 14.60144
scale = mean * dispersion = exp(1.217438) * 0.06848647 = 0.231383"

bbolker commented 7 months ago

Thanks for the detailed comment. I'm completely on board with adding information to the documentation (would you like to submit a pull request ... ???) I'd like to hold off/think very carefully about making changes that will break back-compatibility. (I'm not saying we shouldn't make those changes, just that we have to consider the pros and cons ...)

DrJerryTAO commented 7 months ago

I just updated my original post to include issues with ziGamma which are essentially the same things with Gamma.

I am not sure what a pull request does and how I can make one. I have not used GitHub extensively.

For compatibility, I think we can print a message when predict(type = 'disp') and glmmTMB(family = Gamma/ziGamma) are called, stating changes in predicted dispersion and reported estimates in dispersion model. In the help() pages, change the elements as I listed above.

bbolker commented 7 months ago

I (or someone) will work on this when I (or they) have time.

it might not hurt to get more familiar with GitHub/use this as a way to practice collaboration tools such as pull requests ...
the point of back-compatibility is not documentation, but trying to make sure that previous results don't change unexpectedly. Printing a message can be helpful, but doesn't solve the problem of deciding whether a compatibility-breaking change is worth making.

glmmTMB / glmmTMB

Wrong help documentation and inconsistent parameter naming related to `family = Gamma` and `ziGamma` #990