easystats / modelbased

:chart_with_upwards_trend: Estimate effects, contrasts and means based on statistical models
https://easystats.github.io/modelbased/
GNU General Public License v3.0
233 stars 19 forks source link

Marginal Means and Contrasts #144

Closed awaldman123 closed 3 years ago

awaldman123 commented 3 years ago

I am trying to calculate marginal means taking into account all sources of variation (fixed effects, zero-inflation, and random effects) and was planning to use the ggpredict command and type="sim" argument in the ggeffects package. However, I couldn't find a way to compute the pairwise comparisons afterwards like what is done in emmeans by the contrasts function (I put out an inquiry to see if I was missing something). I then stumbled upon this package and wanted to see if I could leverage some of the functions here to use on the ggpredict output for contrasts since I 1) cannot use emmeans as it requires an emgrid object and 2) couldn't find a contrast function within the ggeffects package?

Thanks for all your help!

bwiernik commented 3 years ago

The modelbased::estimate_contrasts() function is a wrapper around emmeans and can be used to estimate contrasts (as far as I am reading, you could also use emmeans directly).

For example:

m1 <- glmmTMB::glmmTMB(carb ~ cyl + (1 | gear), family = "poisson", ziformula = ~ cyl, data = data)
m2 <- glmmTMB::glmmTMB(carb ~ cyl + (1 | gear), family = "poisson", data = data)
m3 <- glmmTMB::glmmTMB(carb ~ cyl, family = "poisson", data = data)
modelbased::estimate_contrasts(m1, contrast = "cyl")
modelbased::estimate_contrasts(m2, contrast = "cyl")
modelbased::estimate_contrasts(m3, contrast = "cyl")
awaldman123 commented 3 years ago

The issue I was running into is that I wanted to calculate the "true" estimated marginal mean of the model (ie the count component conditioned on the zero-inflation part) and I can't do that in emmeans and it seems only ggeffects supports this? I was hoping to compute contrasts on the output of the "true" estimated marginal means but cannot use the emmeans contrasts function on the output of ggpredict. Any thoughts on what I could use instead? I'm not exactly sure what occurs "under the hood" for the emmeans contrast function but would performing pairwise contrasts outside any package by just using t-tests and a p-value adjustment of my choosing mirror what the contrast function is doing and provide the same result as if the contrasts were performed within the emmeans or modelbased packages?

bwiernik commented 3 years ago

It's not clear to me what you mean by "true" here.

awaldman123 commented 3 years ago

I mean the marginal mean being conditioned on the fixed effects and the zero-inflation component while taking the random effect variances into account. In other words, what the ggpredict function tabulates when using the argument type="sim."

strengejacke commented 3 years ago

Actually, ggpredict(type = "zi_random") computes the marginal mean, taking the random effect variances into account. But ggpredict(type = "simulate") does something similar, with a different approach (the former uses predict(type = "response"), while the latter uses simulate()). I think the details are described in Brooks ME, Kristensen K, Benthem KJ van, Magnusson A, Berg CW, Nielsen A, et al. glmmTMB Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed Modeling. The R Journal. 2017;9: 378-400, in particular pp.392-393.

mattansb commented 3 years ago

I think what @awaldman123 is asking is the marginal means / contrasts marginalized over the two components?

The only method I am aware that can do this, is using Bayesian modeling. Here is an example using brms:

library(brms)
library(emmeans)

data("bioChemists", package = "pscl")
head(bioChemists)
#>   art   fem     mar kid5  phd ment
#> 1   0   Men Married    0 2.52    7
#> 2   0 Women  Single    0 2.05    6
#> 3   0 Women  Single    0 3.75    6
#> 4   0   Men Married    1 1.18    3
#> 5   0 Women  Single    0 3.75   26
#> 6   0 Women Married    2 3.59    2

m <- brm(formula = bf(art ~ mar,
                      zi ~ mar), 
         data = bioChemists, 
         family = zero_inflated_poisson())

# Conditional:
emmeans(m, ~ mar, dpar = "mu", type = "response")
#>  mar     rate lower.HPD upper.HPD
#>  Single  2.03      1.81      2.24
#>  Married 2.18      2.02      2.34
#> 
#> Point estimate displayed: median 
#> Results are back-transformed from the log scale 
#> HPD interval probability: 0.95

# Zero:
emmeans(m, ~ mar, dpar = "zi", transform = "response")
#>  mar     response lower.HPD upper.HPD
#>  Single     0.219     0.155     0.288
#>  Married    0.200     0.154     0.245
#> 
#> Point estimate displayed: median 
#> HPD interval probability: 0.95

# Expected:
emmeans(m, ~ mar, epred = TRUE)
#> mar     emmean lower.HPD upper.HPD
#> Single    1.58      1.43      1.78
#> Married   1.75      1.61      1.86
#> 
#> Point estimate displayed: median 
#> HPD interval probability: 0.95

(Here too you can marginalize across random effects with the re_formula = NULL argument)

strengejacke commented 3 years ago

Hm, these results with glmmTMB look similar:

library(glmmTMB)
#> Warning: package 'glmmTMB' was built under R version 4.1.1
library(ggeffects)

data("bioChemists", package = "pscl")
head(bioChemists)
#>   art   fem     mar kid5  phd ment
#> 1   0   Men Married    0 2.52    7
#> 2   0 Women  Single    0 2.05    6
#> 3   0 Women  Single    0 3.75    6
#> 4   0   Men Married    1 1.18    3
#> 5   0 Women  Single    0 3.75   26
#> 6   0 Women Married    2 3.59    2

m <- glmmTMB(art ~ mar,
             ziformula = ~ mar, 
             data = bioChemists, 
             family = poisson())

# Conditional
ggpredict(m, "mar")
#> # Predicted counts of art
#> 
#> mar     | Predicted |       95% CI
#> ----------------------------------
#> Single  |      2.04 | [1.83, 2.27]
#> Married |      2.18 | [2.03, 2.34]

# ZI
ggpredict(m, "mar", type = "zi_prob")
#> # Predicted zero-inflation probabilities of art
#> 
#> mar     | Predicted |       95% CI
#> ----------------------------------
#> Single  |      0.22 | [0.16, 0.29]
#> Married |      0.20 | [0.16, 0.25]

# Expected
ggpredict(m, "mar", type = "zero_inflated")
#> # Predicted counts of art
#> 
#> mar     | Predicted |       95% CI
#> ----------------------------------
#> Single  |      1.59 | [1.38, 1.80]
#> Married |      1.74 | [1.58, 1.90]

# Expected, with random - makes less sense, no RE specified
# but also takes distribution-specific variance into account
ggpredict(m, "mar", type = "zero_inflated_random")
#> # Predicted counts of art
#> 
#> mar     | Predicted |        95% CI
#> -----------------------------------
#> Single  |      1.59 | [0.19, 12.82]
#> Married |      1.74 | [0.22, 13.45]
#> 
#> Intervals are prediction intervals.

# Expected, with or without random
ggpredict(m, "mar", type = "simulate")
#> # Predicted counts of art
#> 
#> mar     | Predicted |       95% CI
#> ----------------------------------
#> Single  |      1.59 | [0.00, 5.00]
#> Married |      1.74 | [0.00, 5.08]

Created on 2021-09-08 by the reprex package (v2.0.1)

mattansb commented 3 years ago

Ah, I see predict.glmmTMB allows for the expected value - se we can get the expected means + CIs

But emm_basis.glmmTMB only allows for "cond" or "zi" separately - so we can't properly compute contrasts between the expected means :/

awaldman123 commented 3 years ago

Thanks! So it looks like I can't get the expected mean in a glmmtmb model directly through emmeans but must use ggpredict. However, how do I compute contrasts using the ggpredict output since it cannot be used by the emmeans contrast function? Can I compute the contrasts by hand by performing paired t-tests with p-value adjustment? I'm a bit naive as to what is happening behind the scenes of the emmeans contrast function but would like to be able to compute the contrasts using the expected means.

mattansb commented 3 years ago

@awaldman123 There doesn't currently seem to be a way to get contrasts and their CIs / p-values for the expected outcome in glmmTMB (I've oppend an issue over there https://github.com/glmmTMB/glmmTMB/issues/754) (emmeans or glht require additional information not currently available.)

bwiernik commented 3 years ago

If you are going to be using parametric simulation anyway with glmmmTMB, you may as well use MCMC via brms.

awaldman123 commented 3 years ago

Thank you all! This is very helpful. I'm quite naive to Bayesian methods. Therefore, I didn't know that it was kosher to build a Bayesian multilevel model then compute estimated marginal means and look at pairwise comparisons in a frequentist manner with emmeans?

In addition, if I use emmeans and set epred=TRUE as shown, will that mirror what the ggpredict function does with the argument type="simulate"? In other words, will random effects be taken into account when tabulating the point estimates or just the fixed effects and zero inflation component?

mattansb commented 3 years ago

If you set both epred=TRUE and re_formula = NULL - yes.

awaldman123 commented 3 years ago

Thank you! As more of a general question: How do interpret the pairwise contrasts using the estimated means from the brms model? I'm new to Bayesian interpretation and not sure what the contrast output is testing/showing me so I can make conclusions since it's not the frequentist tests I'm used to.

mattansb commented 3 years ago

The output is a point estimate + CI from the posterior distribution.

awaldman123 commented 3 years ago

Thanks! So using the "bioChemists" data, I mirrored what you did above:

m <- brm(formula = bf(art ~ mar, zi ~ mar), data = bioChemists, family = zero_inflated_poisson())

expected<-emmeans(m, ~ mar, epred = TRUE)

Then I ran contrasts:

contrast(expected,"pairwise")

contrast estimate lower.HPD upper.HPD Single - Married -0.153 -0.369 0.0516

Point estimate displayed: median HPD interval probability: 0.95

Since the HPD overlaps 0, then would I conclude that the resulting comparison is not "significant"?

I was reading more about other methods to summarize the contrasts in bayesian models and wanted to see how I could extract the pd, ROPE percentage, and BayesFactors for each contrast that emmeans would output? I assume something besides emmeans would be required? Can I use the estimates from emmeans and feed it into another package to do the contrasts?

My apologies for all the questions. The Bayesian approach is quite new to me and I want to make sure I don't make any missteps.

mattansb commented 3 years ago

I was reading more about other methods to summarize the contrasts in bayesian models and wanted to see how I could extract the pd, ROPE percentage, and BayesFactors for each contrast that emmeans would output? I assume something besides emmeans would be required? Can I use the estimates from emmeans and feed it into another package to do the contrasts?

You can use bayestestR for many if not all of those (support the output from emmans).

Since the HPD overlaps 0, then would I conclude that the resulting comparison is not "significant"? ... My apologies for all the questions. The Bayesian approach is quite new to me and I want to make sure I don't make any missteps.

This topic is very broad and we will not be able to cover all you need to know here. I suggest reading one of these books before you start your analysis, as there is much to be considered both before and after fitting a model:

awaldman123 commented 3 years ago

Thank you so much! :)