Open DominiqueMakowski opened 3 years ago
hm... Not sure if this is due to the documentation, or a misunderstanding of the documentation, or a bug in bayes_r2()
?
from posterior_predict()
:
re.form
If object contains group-level parameters, a formula indicating which group-level parameters to condition on when making predictions. re.form is specified in the same form as for predict.merMod. The default, NULL, indicates that all estimated group-level parameters are conditioned on. To refrain from conditioning on any group-level parameters, specify NA or ~0. The newdata argument may include new levels of the grouping factors that were specified when the model was estimated, in which case the resulting posterior predictions marginalize over the relevant variables.
library(rstanarm)
#> Loading required package: Rcpp
#> This is rstanarm version 2.21.1
#> - See https://mc-stan.org/rstanarm/articles/priors for changes to default priors!
#> - Default priors may change, so it's safest to specify priors, even if equivalent to the defaults.
#> - For execution on a local, multicore CPU with excess RAM we recommend calling
#> options(mc.cores = parallel::detectCores())
library(performance)
#>
#> Attaching package: 'performance'
#> The following object is masked from 'package:rstanarm':
#>
#> pp_check
m <- stan_lmer(Sepal.Length ~ Petal.Length + (1 | Species), data = iris, refresh=0, seed=333)
# conditional, including random effects
median(as.vector(rstantools::bayes_R2(m, re.form = NULL, re_formula = NULL, summary = FALSE)))
#> [1] 0.8292369
# marginal, no random effects
median(as.vector(rstantools::bayes_R2(m, re.form = NA, re_formula = NA, summary = FALSE)))
#> [1] 0.9527061
r2(m)
#> # Bayesian R2 with Standard Error
#>
#> Conditional R2: 0.829 (89% CI [0.797, 0.859])
#> Marginal R2: 0.953 (89% CI [0.938, 0.966])
Created on 2020-10-26 by the reprex package (v0.3.0)
The re_formula
doesn't seem to affect the results in any way:
library(rstanarm)
library(performance)
# Reference
r2(lme4::lmer(Sepal.Length ~ Petal.Length + (1 | Species), data = iris))
#> # R2 for Mixed Models
#>
#> Conditional R2: 0.969
#> Marginal R2: 0.658
m <- stan_lmer(Sepal.Length ~ Petal.Length + (1 | Species), data = iris, refresh=0, seed=333)
r2(m)
#> # Bayesian R2 with Standard Error
#>
#> Conditional R2: 0.829 (89% CI [0.798, 0.861])
#> Marginal R2: 0.953 (89% CI [0.938, 0.966])
r2_loo(m)
#> LOO-adjusted R2
#> 0.8275185
median(as.vector(rstantools::bayes_R2(m, re.form = NULL, re_formula = NULL, summary = FALSE)))
#> [1] 0.829268
median(as.vector(rstantools::bayes_R2(m, re.form = NULL, re_formula = NA, summary = FALSE)))
#> [1] 0.829268
median(as.vector(rstantools::bayes_R2(m, re.form = NA, re_formula = NULL, summary = FALSE)))
#> [1] 0.9529409
median(as.vector(rstantools::bayes_R2(m, re.form = NA, re_formula = NA, summary = FALSE)))
#> [1] 0.9529409
Created on 2020-10-26 by the reprex package (v0.3.0)
Maybe it's related to your issue opened at https://github.com/paul-buerkner/brms/issues/610 ?
re_formula
is for compatibility with brms::bayes_r2()
.
could it be related to that change?
Maybe... We should test on an older rstanarm-version, maybe that's a bug in posterior_epred()
?
(wrong issue...) 🤣
The mystery intensifies, the "bug" is still there if I replace epred by linpred:
library(rstanarm)
library(performance)
# Reference
r2(lme4::lmer(Sepal.Length ~ Petal.Length + (1 | Species), data = iris))
#> # R2 for Mixed Models
#>
#> Conditional R2: 0.969
#> Marginal R2: 0.658
m <- stan_lmer(Sepal.Length ~ Petal.Length + (1 | Species), data = iris, refresh=0, seed=333)
bayes_R2_old <- function(object, ..., re.form = NULL) {
fam <- family(object)$family
if (!fam %in% c("gaussian", "binomial")) {
stop("bayes_R2 is only available for Gaussian and binomial models.")
}
mu_pred <- posterior_linpred(object, transform = TRUE, re.form = re.form)
if (rstanarm:::is.binomial(fam)) {
y <- get_y(object)
if (NCOL(y) == 2) {
trials <- rowSums(y)
trials_mat <- matrix(trials, nrow = nrow(mu_pred), ncol = ncol(mu_pred),
byrow = TRUE)
tmp <- mu_pred * trials_mat
sigma2 <- rowMeans(tmp * (1 - mu_pred))
mu_pred <- tmp
} else {
sigma2 <- rowMeans(mu_pred * (1 - mu_pred))
}
} else {
sigma2 <- drop(as.matrix(object, pars = "sigma"))^2
}
var_mu_pred <- apply(mu_pred, 1, var)
r_squared <- var_mu_pred / (var_mu_pred + sigma2)
return(r_squared)
}
median(as.vector(bayes_R2_old(m, re.form = NULL, re_formula = NA, summary = FALSE)))
#> Instead of posterior_linpred(..., transform=TRUE) please call posterior_epred(), which provides equivalent functionality.
#> [1] 0.829268
median(as.vector(bayes_R2_old(m, re.form = NA, re_formula = NA, summary = FALSE)))
#> Instead of posterior_linpred(..., transform=TRUE) please call posterior_epred(), which provides equivalent functionality.
#> [1] 0.9529409
Created on 2020-10-26 by the reprex package (v0.3.0)
I think this is worth posting a question on the stan forum.
@mattansb I forgot to follow up but did you (or @strengejacke) by any chance posted it on these forums?
No, not me yet.
Nor me..
So this isn't a "bug" per-se, but the result of a different method used by rstanarm:::bayes_R2.stanreg
compared to like any other method :/
https://discourse.mc-stan.org/t/bayes-r2-and-conditioning-on-random-effects/20461/15
awesome deductive process @mattansb I never cease to be amazed :)
We have two possible solutions on our end:
Using the default method in rstantools
:
m <- rstanarm::stan_glmer(Sepal.Length ~ Petal.Length + (1 | Species), data = iris,
chains = 1, refresh = 0)
"Bug"
c(
"with random" = mean(rstanarm::bayes_R2(m, re.form = NULL)),
"no random" = mean(rstanarm::bayes_R2(m, re.form = NA))
)
#> with random no random
#> 0.8271798 0.9512330
"Fix"
rstanarm_bayes_r2_fix <- function(object, re.form = NULL) {
mu_pred <- rstanarm::posterior_epred(object, re.form = re.form)
y <- insight::get_response(object)
rstantools::bayes_R2(mu_pred, y = y)
}
c(
"with random" = mean(rstanarm_bayes_r2_fix(m, re.form = NULL)),
"no random" = mean(rstanarm_bayes_r2_fix(m, re.form = NA))
)
#> with random no random
#> 0.8309479 0.7416505
r2_nakagawa_Bayes()
(For brms
too).
Note than the results from (1) are different than the ones obtained in r2_nakagawa()
from lme4
. This is because r2_nakagawa()
is parametric - relying on the parametric decomposition of the residual + random effects variances.
Although currently r2_nakagawa()
works with Bayesian models:
performance::r2_nakagawa(m)
#> # R2 for Mixed Models
#>
#> Conditional R2: 0.965
#> Marginal R2: 0.704
We can consider implementing the parametric formula from r2_nakagawa()
with the posteriors of the relevant variance parameters.
(IMO it is more important to have consistency within each of our methods across models, than to be consistent with methods from other packages, e.g. rstantools:bayes_r2
)
it is more important to have consistency within each of our methods across models, than to be consistent with methods from other packages
absolutely
Do you suggest going for fix 1 or implementing 2?
If you agree about the consistency, then 2. (We would have to document exactly what were doing, and explain why results might vary from other packages..)
Just came across this old issue - @strengejacke, do you think that has been resolved with some of the new updates, or did they not apply to Bayesian models?
I haven't checked, but the recent changes should not have affected Bayesian models, as we internally call rstantools::bayes_R2()
. Yet, this issue might be resolved when there have been changes to rstantools::bayes_R2()
in the meantime.
Just tested, but with brms
as rstantools
installation failed for me.
What would we expect? I tested with r2_nagakawa
and r2_bayes
, they give different answers (which I guess is expected), but neither seem to be weird, so Conditional R2 is higher than Marginal R2 for both tests.
m <- brms::brm(Sepal.Length ~ Petal.Length + (1 | Species),
data = iris,
refresh = 0,
seed = 333)
performance::r2_nakagawa(m)
# # R2 for Mixed Models
#
# Conditional R2: 0.977
# Marginal R2: 0.486
performance::r2_bayes(m)
# # Bayesian R2 with Compatibility Interval
#
# Conditional R2: 0.834 (95% CI [0.811, 0.852])
# Marginal R2: 0.736 (95% CI [0.707, 0.765])
Created on 2020-10-26 by the reprex package (v0.3.0)
In the frequentist version, R2 conditional > R2 marginal (makes sense, total model > subpart), but in the the Bayesian version, the R2 marginal is super high (on par with the R2 conditional of the frequentist version). Am I missing something obvious?