Model 13.3 - Githubissues

ASKurz commented 6 years ago

It’s unclear to me why the varying intercepts varies more greatly for the brms solution than McElreath’s:

m13.3 <- map2stan(
    alist(
        admit ~ dbinom( applications , p ),
        logit(p) <- a_dept[dept_id] +
                    bm_dept[dept_id]*male,
        c(a_dept,bm_dept)[dept_id] ~ dmvnorm2( c(a,bm) , sigma_dept , Rho ),
        a ~ dnorm(0,10),
        bm ~ dnorm(0,1),
        sigma_dept ~ dcauchy(0,2),
        Rho ~ dlkjcorr(2)
), data=d , warmup=1000 , iter=5000 , chains=4 , cores=3 )

It’s really apparent if you compare the output of my coefficients plot to his:

plot( precis(m13.3,pars=c("a_dept","bm_dept"),depth=2) )

paul-buerkner commented 6 years ago

Can you post your brms version here?

ASKurz commented 6 years ago

Sure:

library(rethinking)
data(UCBadmit)
d <- UCBadmit

detach(package:rethinking, unload = T)
library(brms)
rm(UCBadmit)

d <- 
  d %>%
  mutate(male    = ifelse(applicant.gender == "male", 1, 0),
         dept_id = rep(1:6, each = 2))

b13.3 <- 
  brm(data = d, family = binomial,
      admit | trials(applications) ~ 1 + male + (1 + male | dept_id),
      prior = c(set_prior("normal(0, 10)", class = "Intercept"),
                set_prior("normal(0, 1)", class = "b"),
                set_prior("cauchy(0, 2)", class = "sd"),
                set_prior("lkj(2)", class = "cor")),
      iter = 5000, warmup = 1000, chains = 4, cores = 4,
      control = list(adapt_delta = .99,
                     max_treedepth = 12))

paul-buerkner commented 6 years ago

brms performs some internal centering, which changes the intercept temporarily. This is not immediately visible and usally not relevant for the user, but it affects the handling of the prior of the population-level intercept. What happens if you write

admit | trials(applications) ~ 0 + intercept + male + (1 + male | dept_id)

ASKurz commented 6 years ago

Nope, the solutions are still quite different. The differences appear in the group-specific parameters. The population parameters are good.

Let m13.3 be the original rethinking model from the text, b13.3.0 be the model you just suggested (using the 0 + intercept syntax), and b13.3.1 be my original attempt (using the 1 syntax for the intercept). Here’s the code for the models, in that order:

m13.3 <- map2stan(
    alist(
        admit ~ dbinom( applications , p ),
        logit(p) <- a_dept[dept_id] +
                    bm_dept[dept_id]*male,
        c(a_dept,bm_dept)[dept_id] ~ dmvnorm2( c(a,bm) , sigma_dept , Rho ),
        a ~ dnorm(0,10),
        bm ~ dnorm(0,1),
        sigma_dept ~ dcauchy(0,2),
        Rho ~ dlkjcorr(2)
), data=d , warmup=1000 , iter=5000 , chains=4 , cores=3 )

b13.3.0 <- 
  brm(data = d, family = binomial,
      admit | trials(applications) ~ 0 + intercept + male + (1 + male | dept_id),
      prior = c(set_prior("normal(0, 10)", class = "b", coef = "intercept"),
                set_prior("normal(0, 1)", class = "b"),
                set_prior("cauchy(0, 2)", class = "sd"),
                set_prior("lkj(2)", class = "cor")),
      iter = 5000, warmup = 1000, chains = 4, cores = 4,
      control = list(adapt_delta = .99,
                     max_treedepth = 12))

b13.3.1 <- 
  brm(data = d, family = binomial,
      admit | trials(applications) ~ 1 + male + (1 + male | dept_id),
      prior = c(set_prior("normal(0, 10)", class = "Intercept"),
                set_prior("normal(0, 1)", class = "b"),
                set_prior("cauchy(0, 2)", class = "sd"),
                set_prior("lkj(2)", class = "cor")),
      iter = 5000, warmup = 1000, chains = 4, cores = 4,
      control = list(adapt_delta = .99,
                     max_treedepth = 12))

In the same order, here are the results for the group-specific parameters:

# m13.3, as summarized by `precis(m13.3, 2, prob = .95)`

               Mean StdDev lower 0.95 upper 0.95 n_eff Rhat
a_dept[1]      1.31   0.25       0.80       1.80  7392    1
a_dept[2]      0.74   0.33       0.10       1.39  9150    1
a_dept[3]     -0.65   0.09      -0.81      -0.48 14523    1
a_dept[4]     -0.62   0.10      -0.82      -0.42 13805    1
a_dept[5]     -1.13   0.12      -1.36      -0.90 16000    1
a_dept[6]     -2.60   0.20      -3.01      -2.21 13188    1
bm_dept[1]    -0.79   0.27      -1.32      -0.27  7155    1
bm_dept[2]    -0.21   0.33      -0.88       0.43  9214    1
bm_dept[3]     0.08   0.14      -0.19       0.35 13655    1
bm_dept[4]    -0.09   0.14      -0.37       0.18 13627    1
bm_dept[5]     0.12   0.19      -0.25       0.49 13932    1
bm_dept[6]    -0.12   0.27      -0.65       0.42 11047    1

# b13.3.0, as summarized by broom::tidy(b13.3.0) %>% mutate_if(is.numeric, round, digits = 2)

                           term estimate std.error  lower  upper
6        r_dept_id[1,Intercept]     1.82      0.76   0.60   3.06
7        r_dept_id[2,Intercept]     1.25      0.78   0.01   2.52
8        r_dept_id[3,Intercept]    -0.13      0.74  -1.32   1.04
9        r_dept_id[4,Intercept]    -0.10      0.74  -1.28   1.08
10       r_dept_id[5,Intercept]    -0.62      0.74  -1.81   0.56
11       r_dept_id[6,Intercept]    -2.09      0.75  -3.30  -0.89
12            r_dept_id[1,male]    -0.63      0.33  -1.21  -0.14
13            r_dept_id[2,male]    -0.05      0.36  -0.63   0.53
14            r_dept_id[3,male]     0.24      0.26  -0.16   0.68
15            r_dept_id[4,male]     0.07      0.26  -0.35   0.50
16            r_dept_id[5,male]     0.28      0.28  -0.13   0.76
17            r_dept_id[6,male]     0.03      0.32  -0.50   0.55

# b13.3.1, as summarized by broom::tidy(b13.3.1) %>% mutate_if(is.numeric, round, digits = 2)

               Mean StdDev lower 0.95 upper 0.95 n_eff Rhat
6        r_dept_id[1,Intercept]     1.79      0.76   0.58   3.01
7        r_dept_id[2,Intercept]     1.23      0.78   0.00   2.46
8        r_dept_id[3,Intercept]    -0.16      0.74  -1.37   1.00
9        r_dept_id[4,Intercept]    -0.13      0.74  -1.33   1.03
10       r_dept_id[5,Intercept]    -0.64      0.74  -1.85   0.52
11       r_dept_id[6,Intercept]    -2.11      0.75  -3.35  -0.94
12            r_dept_id[1,male]    -0.63      0.33  -1.21  -0.15
13            r_dept_id[2,male]    -0.05      0.35  -0.63   0.52
14            r_dept_id[3,male]     0.25      0.26  -0.16   0.69
15            r_dept_id[4,male]     0.07      0.26  -0.35   0.49
16            r_dept_id[5,male]     0.28      0.28  -0.14   0.75
17            r_dept_id[6,male]     0.04      0.33  -0.50   0.57

I’ve omitted the parameters you’d typically get with print(brm.fit) because they’re basically all the same within a little simulation error.

As a refresher, the rethinking a_dept[i] is equivalent to brms r_dept_id[i,Intercept]. I also reordered the rethinking output to match that of the brms output.

The two brms versions are quite similar. Both have substantially larger posterior SDs than the rethinking solution.

paul-buerkner commented 6 years ago

In this example, rethinking uses the centered parameterization, while brms (always) uses the non-centered parameterization for random effects. I suggest you extract the brms coefficients from coef(), which should yield comparable results to rethinking.

ASKurz commented 6 years ago

Ah. That did the trick. Now I follow. Thanks Paul!

ASKurz / Statistical_Rethinking_with_brms_ggplot2_and_the_tidyverse

Model 13.3 #7