ASKurz / Doing-Bayesian-Data-Analysis-in-brms-and-the-tidyverse

The bookdown version lives here: https://bookdown.org/content/3686
GNU General Public License v3.0
147 stars 43 forks source link

Inconsistent and super wide HDIs #40

Closed OmidGhasemi21 closed 3 years ago

OmidGhasemi21 commented 3 years ago

Dear @ASKurz,

Reading chapter 20, section 20.2.3, I found out that the HDIs and estimates that you calculated do not match the ones in the book. In that section, Krushcke ran a model with Salary as DV and two IDs (Pos and Org). Then, he presented the main effect comparisons as the figure below shows. For example, the left panel shows that the mode is 13400 with 9720 to 16700 HDIs.

Screen Shot 2021-08-26 at 2 02 48 pm

In the bookdown, to plot this figure, the fitted function is used:

# define the new data
nd <- 
  tibble(Pos = c("Assis", "Assoc"),
         Org = "mu")

# feed the new data into `fitted()`
f <-
  fitted(fit20.1,
         newdata = nd,
         summary = F,
         allow_new_levels = T) %>% 
  as_tibble() %>% 
  set_names("Assis", "Assoc") %>% 
  mutate(`Assoc vs Assis` = Assoc - Assis)

# plot
f %>% 
  ggplot(aes(x = `Assoc vs Assis`, y = 0)) +
  stat_beedrill() +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(title = "Assoc vs Assis",
       x = "Difference")

And, this creates the following figure:

Screen Shot 2021-08-26 at 1 58 33 pm

As you can see, the HDIs do not match with the ones in the book. Just to make sure, let's check them more carefully:

f %>% 
  mode_hdi(`Assoc vs Assis`) %>% 
  select(`Assoc vs Assis`:.upper) %>% 
  mutate_if(is.double, round, digits = 0)
## # A tibble: 1 x 3
##   `Assoc vs Assis` .lower .upper
##              <dbl>  <dbl>  <dbl>
## 1            11260 -13545  40783

I am not familiar with the fitted function, but it seems that HDIs of main effect comparisons are not correct. Interestingly, using this method for plotting simple effect and interaction comparisons produces correct HDIs. I guess the solution would be easy, but since I am not familiar at all with fitted, I used another method that you have used in previous chapters. This method gave the correct HDIs for main effect comparisons:

post_pos <- coef(fit20.1, summary = F)$Pos %>% 
  as_tibble() %>%
  select(Assis = Assis.Intercept, Assoc = Assoc.Intercept) %>%
  mutate(`Assoc vs Assis` = Assoc - Assis)

post_pos %>% 
  ggplot(aes(x = `Assoc vs Assis`, y = 0)) +
  stat_beedrill() +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(title = "Assoc vs Assis",
       x = "Difference")
Screen Shot 2021-08-26 at 1 59 15 pm
post_pos %>% 
  mode_hdi(`Assoc vs Assis`)
#  `Assoc vs Assis` .lower .upper .width .point .interval
#             <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
#1           13681.  8475. 18027.   0.95 mode   hdi  

However, the problem with this method is that while it gives the correct HDIs and estimates for main and interaction comparisons, it does not work well with simple effects:

post_pos_org <- coef(fit20.1, summary = F)$`Pos:Org` %>% 
  as_tibble() %>% 
select(Assis_PSY.Intercept, Full_PSY.Intercept, Assis_CHEM.Intercept,Full_CHEM.Intercept) %>%
  transmute (`Full - Assis @ PSY`  = Full_PSY.Intercept - Assis_PSY.Intercept,
             `Full - Assis @ CHEM` = Full_CHEM.Intercept - Assis_CHEM.Intercept) %>% 
  mutate(`Full.v.Assis (x) CHEM.v.PSY` = `Full - Assis @ CHEM` - `Full - Assis @ PSY`) %>%
  gather()

post_pos_org %>% 
  group_by(key) %>%
  mode_hdi(value)
# key                           value  .lower .upper .width .point .interval
# <chr>                         <dbl>   <dbl>  <dbl>  <dbl> <chr>  <chr>    
#   1 Full - Assis @ CHEM          25846.  10907. 40284.   0.95 mode   hdi      
# 2 Full - Assis @ PSY          -11968. -26198.  2658.   0.95 mode   hdi      
# 3 Full.v.Assis (x) CHEM.v.PSY  38480.  17523. 57330.   0.95 mode   hdi    

If you compare the output to the following figure, you can see that while they match the values in the interaction plot (the right one), they are extremely inconsistent with the left and middle plots.

Screen Shot 2021-08-26 at 2 19 54 pm

While it would be great to know the problem with the first method (i.e., fitted), I would be grateful if you could tell me what is wrong with the second method (i.e., coef(fit20.1, summary = F)$Pos:Org).

Thanks a lot.

ASKurz commented 3 years ago

Hey @OmidGhasemi21, thanks for the catch. I'm in the middle of course prep and am not sure when I'll be in a position to look at this closely. As a quick comment, fitted() and coef() can be equivalent in some cases, but they generally do very different things.

ASKurz commented 3 years ago

Okay, I think I have it figured out. The mistake was that where I was using the allow_new_levels argument within fitted(), I really should have been using the re_formula argument, instead.

OmidGhasemi21 commented 3 years ago

Oh, that is great. I will try the new argument then.

I have just one question about coef() and I would appreciate your opinion whenever you are free (there is no rush). I used this function for all models of chapter 19 (single-factor ANOVA) and it worked very well. However, for this chapter, I faced the problem that I mentioned. Is there any remedy to make this function work or do you recommend using fitted() for all models with categorical predictors? Thanks.

ASKurz commented 3 years ago

The draft of the revised workflow is now at https://github.com/ASKurz/Doing-Bayesian-Data-Analysis-in-brms-and-the-tidyverse/blob/master/in_the_works/20.2.3.md.

ASKurz commented 3 years ago

@OmidGhasemi21, as to your question about coef(), would you mind restating it? I'm getting lost in all the output from your first comment.

OmidGhasemi21 commented 3 years ago

That is amazing, thanks.

And regarding my question, I used coef() to calculate main effect and simple effect contrasts but the estimates for simple effect contrasts do not look right. i used this method:

post_pos_org <- coef(fit20.1, summary = F)$`Pos:Org` %>% 
  as_tibble() %>% 
  select(Assis_PSY.Intercept, Full_PSY.Intercept, Assis_CHEM.Intercept,Full_CHEM.Intercept) %>%
  transmute (`Full - Assis @ PSY`  = Full_PSY.Intercept - Assis_PSY.Intercept,
             `Full - Assis @ CHEM` = Full_CHEM.Intercept - Assis_CHEM.Intercept) %>% 
  mutate(`Full.v.Assis (x) CHEM.v.PSY` = `Full - Assis @ CHEM` - `Full - Assis @ PSY`) %>%
  gather()

post_pos_org %>% 
  group_by(key) %>%
  mode_hdi(value)

So, my question simply is if there is a way to get the correct estimates using coef() for multifactor models?

This method works nicely with single-factor models in Chapter 19.

Thanks.

ASKurz commented 3 years ago

Ah, I see. Yeah, I would not use coef() that way. Leave aside, for a moment, that we're talking about effects that have been partially pooled. Imagine a simple model with two lower-order terms and their interaction: y ~ 1 + x1 + x2 + x1:x2. When you use the updated fitted() method, you are extracting the model-based predictions for the full combination of x1, x2, and their interaction. When, however, you use the coef() method, as in your last post, it is as if you were extracting the model-based predictions for the interaction term, alone, while ignoring the two lower-order terms.

OmidGhasemi21 commented 3 years ago

Oh, that is why coef() worked with single-factor models, but not multifactor ones. Thanks a lot for being so helpful.

ASKurz commented 3 years ago

No problem. Thanks, again, for pointing out the coding issue. We both learned something from this exchange.

ASKurz commented 2 years ago

These corrections have now been integrated into the primary .Rmd file with this commit.