Closed ASKurz closed 6 years ago
Can you post your brms version here?
Sure:
library(rethinking)
data(UCBadmit)
d <- UCBadmit
detach(package:rethinking, unload = T)
library(brms)
rm(UCBadmit)
d <-
d %>%
mutate(male = ifelse(applicant.gender == "male", 1, 0),
dept_id = rep(1:6, each = 2))
b13.3 <-
brm(data = d, family = binomial,
admit | trials(applications) ~ 1 + male + (1 + male | dept_id),
prior = c(set_prior("normal(0, 10)", class = "Intercept"),
set_prior("normal(0, 1)", class = "b"),
set_prior("cauchy(0, 2)", class = "sd"),
set_prior("lkj(2)", class = "cor")),
iter = 5000, warmup = 1000, chains = 4, cores = 4,
control = list(adapt_delta = .99,
max_treedepth = 12))
brms performs some internal centering, which changes the intercept temporarily. This is not immediately visible and usally not relevant for the user, but it affects the handling of the prior of the population-level intercept. What happens if you write
admit | trials(applications) ~ 0 + intercept + male + (1 + male | dept_id)
Nope, the solutions are still quite different. The differences appear in the group-specific parameters. The population parameters are good.
Let m13.3
be the original rethinking model from the text, b13.3.0
be the model you just suggested (using the 0 + intercept
syntax), and b13.3.1
be my original attempt (using the 1
syntax for the intercept). Here’s the code for the models, in that order:
m13.3 <- map2stan(
alist(
admit ~ dbinom( applications , p ),
logit(p) <- a_dept[dept_id] +
bm_dept[dept_id]*male,
c(a_dept,bm_dept)[dept_id] ~ dmvnorm2( c(a,bm) , sigma_dept , Rho ),
a ~ dnorm(0,10),
bm ~ dnorm(0,1),
sigma_dept ~ dcauchy(0,2),
Rho ~ dlkjcorr(2)
), data=d , warmup=1000 , iter=5000 , chains=4 , cores=3 )
b13.3.0 <-
brm(data = d, family = binomial,
admit | trials(applications) ~ 0 + intercept + male + (1 + male | dept_id),
prior = c(set_prior("normal(0, 10)", class = "b", coef = "intercept"),
set_prior("normal(0, 1)", class = "b"),
set_prior("cauchy(0, 2)", class = "sd"),
set_prior("lkj(2)", class = "cor")),
iter = 5000, warmup = 1000, chains = 4, cores = 4,
control = list(adapt_delta = .99,
max_treedepth = 12))
b13.3.1 <-
brm(data = d, family = binomial,
admit | trials(applications) ~ 1 + male + (1 + male | dept_id),
prior = c(set_prior("normal(0, 10)", class = "Intercept"),
set_prior("normal(0, 1)", class = "b"),
set_prior("cauchy(0, 2)", class = "sd"),
set_prior("lkj(2)", class = "cor")),
iter = 5000, warmup = 1000, chains = 4, cores = 4,
control = list(adapt_delta = .99,
max_treedepth = 12))
In the same order, here are the results for the group-specific parameters:
# m13.3, as summarized by `precis(m13.3, 2, prob = .95)`
Mean StdDev lower 0.95 upper 0.95 n_eff Rhat
a_dept[1] 1.31 0.25 0.80 1.80 7392 1
a_dept[2] 0.74 0.33 0.10 1.39 9150 1
a_dept[3] -0.65 0.09 -0.81 -0.48 14523 1
a_dept[4] -0.62 0.10 -0.82 -0.42 13805 1
a_dept[5] -1.13 0.12 -1.36 -0.90 16000 1
a_dept[6] -2.60 0.20 -3.01 -2.21 13188 1
bm_dept[1] -0.79 0.27 -1.32 -0.27 7155 1
bm_dept[2] -0.21 0.33 -0.88 0.43 9214 1
bm_dept[3] 0.08 0.14 -0.19 0.35 13655 1
bm_dept[4] -0.09 0.14 -0.37 0.18 13627 1
bm_dept[5] 0.12 0.19 -0.25 0.49 13932 1
bm_dept[6] -0.12 0.27 -0.65 0.42 11047 1
# b13.3.0, as summarized by broom::tidy(b13.3.0) %>% mutate_if(is.numeric, round, digits = 2)
term estimate std.error lower upper
6 r_dept_id[1,Intercept] 1.82 0.76 0.60 3.06
7 r_dept_id[2,Intercept] 1.25 0.78 0.01 2.52
8 r_dept_id[3,Intercept] -0.13 0.74 -1.32 1.04
9 r_dept_id[4,Intercept] -0.10 0.74 -1.28 1.08
10 r_dept_id[5,Intercept] -0.62 0.74 -1.81 0.56
11 r_dept_id[6,Intercept] -2.09 0.75 -3.30 -0.89
12 r_dept_id[1,male] -0.63 0.33 -1.21 -0.14
13 r_dept_id[2,male] -0.05 0.36 -0.63 0.53
14 r_dept_id[3,male] 0.24 0.26 -0.16 0.68
15 r_dept_id[4,male] 0.07 0.26 -0.35 0.50
16 r_dept_id[5,male] 0.28 0.28 -0.13 0.76
17 r_dept_id[6,male] 0.03 0.32 -0.50 0.55
# b13.3.1, as summarized by broom::tidy(b13.3.1) %>% mutate_if(is.numeric, round, digits = 2)
Mean StdDev lower 0.95 upper 0.95 n_eff Rhat
6 r_dept_id[1,Intercept] 1.79 0.76 0.58 3.01
7 r_dept_id[2,Intercept] 1.23 0.78 0.00 2.46
8 r_dept_id[3,Intercept] -0.16 0.74 -1.37 1.00
9 r_dept_id[4,Intercept] -0.13 0.74 -1.33 1.03
10 r_dept_id[5,Intercept] -0.64 0.74 -1.85 0.52
11 r_dept_id[6,Intercept] -2.11 0.75 -3.35 -0.94
12 r_dept_id[1,male] -0.63 0.33 -1.21 -0.15
13 r_dept_id[2,male] -0.05 0.35 -0.63 0.52
14 r_dept_id[3,male] 0.25 0.26 -0.16 0.69
15 r_dept_id[4,male] 0.07 0.26 -0.35 0.49
16 r_dept_id[5,male] 0.28 0.28 -0.14 0.75
17 r_dept_id[6,male] 0.04 0.33 -0.50 0.57
I’ve omitted the parameters you’d typically get with print(brm.fit)
because they’re basically all the same within a little simulation error.
As a refresher, the rethinking a_dept[i]
is equivalent to brms r_dept_id[i,Intercept]
. I also reordered the rethinking output to match that of the brms output.
The two brms versions are quite similar. Both have substantially larger posterior SDs than the rethinking solution.
In this example, rethinking uses the centered parameterization, while brms (always) uses the non-centered parameterization for random effects. I suggest you extract the brms coefficients from coef()
, which should yield comparable results to rethinking.
Ah. That did the trick. Now I follow. Thanks Paul!
It’s unclear to me why the varying intercepts varies more greatly for the brms solution than McElreath’s:
It’s really apparent if you compare the output of my coefficients plot to his: