Closed DominiqueMakowski closed 4 years ago
I like this idea very much!
Looks like most of what is needed for this is already part of the bf_params
code. So with a little cannibalization:
si <- function(posterior, prior, BF = 1) {
extend_scale <- 0.05
precision <- 2^8
x <- c(prior, posterior)
x_range <- range(x)
x_rangex <- stats::median(x) + 7 * stats::mad(x) * c(-1, 1)
x_range <- c(
max(c(x_range[1], x_rangex[1])),
min(c(x_range[2], x_rangex[2]))
)
extension_scale <- diff(x_range) * extend_scale
x_range[1] <- x_range[1] - extension_scale
x_range[2] <- x_range[2] + extension_scale
x_axis <- seq(x_range[1], x_range[2], length.out = precision)
f_prior <- logspline::logspline(prior)
f_posterior <- logspline::logspline(posterior)
d_prior <- logspline::dlogspline(x_axis, f_prior)
d_posterior <- logspline::dlogspline(x_axis, f_posterior)
relative_d <- d_posterior / d_prior
range(x_axis[relative_d > BF])
}
library(bayestestR)
prior <- distribution_normal(1000, mean = 0, sd = 1)
posterior <- distribution_normal(1000, mean = .5, sd = .3)
# looks like is makes sense:
(si_vals <- si(posterior,prior))
#> [1] 0.03999124 1.05310270
plot(bf_parameters(prior,posterior = posterior, null = si_vals))
# more conservative
(si_vals <- si(posterior,prior, BF = 3))
#> [1] 0.3332603 0.7598336
plot(bf_parameters(prior,posterior = posterior, null = si_vals))
Created on 2019-12-29 by the reprex package (v0.3.0)
This is a basic function - it would need methods for extracting priors from stanreg
/brmsfit
models etc, just like in bf_params
, and also a method for one sided BFs I think (but not for interval-null)
What do you think?
Nice! I agree with all that. I wonder what's the best API for that though, as one could imagine:
hdi()
, eti()
that could be triggered via ci(..., method = "SI")
si(bf_parameters(...))
or ci(bf_parametesr(...))
I think we need:
si()
function, that can have the same plot function as bf_params
(seems to work well), that can be triggered with ci(..., method = "SI")
.si(bf_parameters(...))
would actually work quite well. But I'll start working on ^ first.True, they are not incompatible π
library(bayestestR)
x0 <- distribution_normal(1000, mean = 0, sd = 1)
x1 <- distribution_normal(1000, mean = .5, sd = .3)
results <- si(x1,x0)
#> Loading required namespace: logspline
print.data.frame(results)
#> CI CI_low CI_high
#> 1 1 0.0399912416092305 1.05310269570974
library(rstanarm)
stan_model <- stan_lmer(extra ~ group + (1 | ID), data = sleep, refresh = 0)
results <- si(stan_model)
#> Computation of Bayes factors: sampling priors, please wait...
print.data.frame(results)
#> Parameter CI CI_low CI_high Effects Component
#> 1 (Intercept) 1 -0.901647242478401 2.4186288271425 fixed conditional
#> 2 group2 1 0.604759242670623 2.60240882765198 fixed conditional
library(emmeans)
group_diff <- pairs(emmeans(stan_model, ~group))
results <- si(group_diff, prior = stan_model, BF = 3)
#> Computation of Bayes factors: sampling priors, please wait...
print.data.frame(results)
#> Parameter CI CI_low CI_high
#> 1 1 - 2 3 -2.23929218153439 -0.927737308109037
I tried using the current printing method for CIs but:
BF = 1
and not 1%
.print.data.frame(results)
#> Parameter CI CI_low CI_high
#> 1 1 - 2 3 -2.23929218153439 -0.927737308109037
results
#> # Support Interval
#>
#> Parameter 1% SI
#> 1 - 2 [1.00, 1.00]
Also, I couldn't think of a way to make si
work with ci(..., method = "SI")
(because si()
has prior
and BF
arguments that ci()
doesn't, and ci()
has a ci
argument that si()
doesn't).
Need to add:
ci(..., method = "SI")
+ describe_posterior()
?NEWS
Advanced:
@DominiqueMakowski @strengejacke care to check why the printing method is messed up?
Also, since this morning, I've come down with the flu... so my to-dos from above are on hold... π€π€§
A Thought: not sure si
has to be a method of ci
, as it does not return a credible interval - which is concerned with the posterior; instead the SI is concerned, like the BF, with the integration of the priors with the data.
care to check why the printing method is messed up?
Will do asap βΊοΈ
A Thought: not sure si has to be a method of ci, as it does not return a credible interval - which is concerned with the posterior; instead the SI is concerned, like the BF, with the integration of the priors with the data.
That's true if ci()
strictly refers to credible intervals. But that's already not really the case since the same method is used for confidence intervals for freq models. So that wouldn't be that big of a stretch to extend it to cover confirmation intervals. But as I said above, I am not sure of my opinion about this multiplication of what the CI acronym refers to π
In any case, do take a rest, you've done more than enough for this year π Wishing you a swift recovery!!
Alright, this is only missing a plotting method (that can be added later) and the ci(..., method = "SI")
.
One problem with adding ci(..., method = "SI")
is that the ci
argument defaults to 0.89, whereas the corresponding BF
argument in si()
defaults to 1. Thoughts?
Was able to solve the printing issue (I've adopted the terminology from the original paper in both the function and the printing):
library(bayestestR)
prior <- distribution_normal(1000, mean = 0, sd = 1)
posterior <- distribution_normal(1000, mean = .5, sd = .3)
si(posterior,prior, BF = 1/3)
#> Loading required namespace: logspline
#> # Support Interval
#>
#> BF = 0.33333 SI
#> [-0.12, 1.21]
Created on 2019-12-31 by the reprex package (v0.3.0)
The plotting method is a bit trickier than just wrapping plot.bf_parameters()
. For instance, following model:
library(bayestestR)
library(rstanarm)
model <- stan_glm(mpg ~ gear + wt + cyl, data = mtcars)
si(model)
#> Computation of Bayes factors: sampling priors, please wait...
#> Loading required namespace: logspline
#> # Support Interval
#>
#> Parameter BF = 1 SI
#> (Intercept) [30.52, 54.75]
#> gear [-2.49, 1.46]
#> wt [-5.32, -1.47]
#> cyl [-2.55, -0.50]
returns multiple si's, and the null
-argument in bf_parameters()
only accepts a single vector. Possible solutions:
a) save all data frames from posterior and prior samples as attribute to the returned si-object (making it very large)
b) export bayestestR::.update_to_priors()
(i.e. make it accessible from see), so we can sample priors again from the model object when we plot the data.
I think we need all prior/posterior data for each parameter, to correctly plot the different si's.
Sorry, when I meant a wrapper, I meant a wrapper to the internal bayestestR:::.make_BF_plot_data
function which makes the data for plotting bf_parameters
... which is what I did (just pushed to see
):
library(bayestestR)
prior <- distribution_normal(1000, mean = 0, sd = 1)
posterior <- distribution_normal(1000, mean = .5, sd = .3)
x <- si(posterior, prior, BF = 3)
#> Loading required namespace: logspline
plot(x)
library(rstanarm)
#> Loading required package: Rcpp
#> Registered S3 method overwritten by 'xts':
#> method from
#> as.zoo.xts zoo
#> rstanarm (Version 2.19.2, packaged: 2019-10-01 20:20:33 UTC)
#> - Do not expect the default priors to remain the same in future rstanarm versions.
#> Thus, R scripts should specify priors explicitly, even if they are just the defaults.
#> - For execution on a local, multicore CPU with excess RAM we recommend calling
#> options(mc.cores = parallel::detectCores())
#> - bayesplot theme set to bayesplot::theme_default()
#> * Does _not_ affect other ggplot2 plots
#> * See ?bayesplot_theme_set for details on theme setting
contrasts(sleep$group) <- contr.bayes # see vingette
stan_model <- stan_lmer(extra ~ group + (1 | ID), data = sleep, refresh = 0)
x <- si(stan_model, BF = 6)
#> Computation of Bayes factors: sampling priors, please wait...
plot(x, show_intercept = TRUE) +
ggplot2::coord_cartesian(xlim = c(-5,5)) # wide priors suckkkkkkk (for plotting)
Created on 2019-12-31 by the reprex package (v0.3.0)
This looks more like the bf_params
plot than the hdi
plot, but I think that given the close relationship between SIs and BFs that works quite well!
Alright, I think I'm done here - added SI as a method to ci()
and to descrive_posterior()
:
library(bayestestR)
# Numeric -----------------------------------------------------------------
prior <- distribution_normal(1000, mean = 0, sd = 1)
posterior <- distribution_normal(1000, mean = .5, sd = .3)
set.seed(5)
si(posterior, prior, BF = 3)
#> Loading required namespace: logspline
#> # Support Interval
#>
#> BF = 3 SI
#> [0.33, 0.76]
set.seed(5)
ci(posterior, method = "SI", prior = prior, BF = 3)
#> # Support Interval
#>
#> BF = 3 SI
#> [0.33, 0.76]
set.seed(5)
describe_posterior(posterior, ci_method = "SI", bf_prior = prior, BF = 3)
#> Parameter Median CI CI_low CI_high pd ROPE_CI ROPE_low ROPE_high
#> 1 Posterior 0.5 3 0.3332603 0.7598336 0.953 89 -0.1 0.1
#> ROPE_Percentage
#> 1 0.04040404
# Model -------------------------------------------------------------------
library(rstanarm)
#> Loading required package: Rcpp
#> Registered S3 method overwritten by 'xts':
#> method from
#> as.zoo.xts zoo
#> rstanarm (Version 2.19.2, packaged: 2019-10-01 20:20:33 UTC)
#> - Do not expect the default priors to remain the same in future rstanarm versions.
#> Thus, R scripts should specify priors explicitly, even if they are just the defaults.
#> - For execution on a local, multicore CPU with excess RAM we recommend calling
#> options(mc.cores = parallel::detectCores())
#> - bayesplot theme set to bayesplot::theme_default()
#> * Does _not_ affect other ggplot2 plots
#> * See ?bayesplot_theme_set for details on theme setting
model <- stan_glm(mpg ~ wt, data = mtcars, chains = 2, iter = 200, refresh = 0)
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> http://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
#> Running the chains for more iterations may help. See
#> http://mc-stan.org/misc/warnings.html#tail-ess
set.seed(5)
si(model, BF = 3)
#> Computation of Bayes factors: sampling priors, please wait...
#> # Support Interval
#>
#> Parameter BF = 3 SI
#> (Intercept) [33.71, 42.02]
#> wt [-6.44, -4.25]
set.seed(5)
ci(model, method = "SI", BF = 3)
#> Computation of Bayes factors: sampling priors, please wait...
#> # Support Interval
#>
#> Parameter BF = 3 SI
#> (Intercept) [33.71, 42.02]
#> wt [-6.44, -4.25]
set.seed(5)
describe_posterior(model, ci_method = "SI", BF = 3)
#> Computation of Bayes factors: sampling priors, please wait...
#> # Description of Posterior Distributions
#>
#> Parameter Median CI CI_low CI_high pd ROPE_CI ROPE_low ROPE_high
#> (Intercept) 37.39 3 33.71 42.02 1 89 -0.603 0.603
#> wt -5.36 3 -6.44 -4.25 1 89 -0.603 0.603
#> ROPE_Percentage Rhat ESS
#> 0 0.990 267
#> 0 0.991 303
# emmeans (to be safe) ----------------------------------------------------
library(emmeans)
#> Welcome to emmeans.
#> NOTE -- Important change from versions <= 1.41:
#> Indicator predictors are now treated as 2-level factors by default.
#> To revert to old behavior, use emm_options(cov.keep = character(0))
et_wt <- emtrends(model, ~1, "wt")
set.seed(5)
si(et_wt, prior = model, BF = 3)
#> Computation of Bayes factors: sampling priors, please wait...
#> # Support Interval
#>
#> Parameter BF = 3 SI
#> overall [-6.54, -4.34]
set.seed(5)
ci(et_wt, method = "SI", prior = model, BF = 3)
#> Computation of Bayes factors: sampling priors, please wait...
#> # Support Interval
#>
#> Parameter BF = 3 SI
#> overall [-6.54, -4.34]
set.seed(5)
describe_posterior(et_wt, ci_method = "SI", bf_prior = model, BF = 3)
#> Computation of Bayes factors: sampling priors, please wait...
#> Parameter Median CI CI_low CI_high pd ROPE_CI ROPE_low ROPE_high
#> 1 overall -5.364072 3 -6.535697 -4.338882 1 89 -0.1 0.1
#> ROPE_Percentage
#> 1 0
Created on 2020-01-02 by the reprex package (v0.3.0)
In the paramaters readme, we say that
The column names of the returned data frame are specific to their content.
In this case, CI
would be BF
and CI_*
would be SI_*
. Should we also make the column names context specific? This is very likely to break code. Or we have a print()
method that changes the column names for the output only.
Currently the printing method of bayestestr_si
takes care of that...
Anyway, this is an edge case where I think we can let it slide, but I can try and see how much code this will break over the weekend.
Sorry for confusion, I meant the describe_posterior()
method. I committed a change that should print context specific column names.
Not sure how time consuming sampling from priors can become, but would it make sense to use simulate_priors()
instead?
Sampling from priors take some time - but less time than sampling from the posterior.
I haven't really played with the simulation function yet. One problem with that method is that it doesn't account for multivariate dependency in the parameter space, so it will always lead to missestimated Bayes factors / SIs for emmGrid
objects, but I'm not sure how it holds up for univariate estimation of single parameters... I would have to look into that + see if I can speed up prior sampling.
One problem with that method is that it doesn't account for multivariate dependency in the parameter space
the simulate priors have so easy implementations that I'd be surprised if Paul didn't think about this solution and that sampling for prior was equivalent but slower than it. Until we check, I'd agree that it's better to stick with the original
ok. So we could actually close this issue... And have to check why travis fails?!? Local checks are passing.
And have to check why travis fails?
I'll investigate π΅
And have to check why travis fails?!?
Why indeed????
if Paul didn't think about this solution [...] I'd agree that it's better to stick with the original
Yes, it seems to be better over all, even if slower. I was able to speed it up a bit for some cases. Let's leave it at that.
Closing this issue now.
@strengejacke made a minor change to the printing method of describe_posterior
so that there wouldn't be two columns labeled BF
, so that this:
library(bayestestR)
library(rstanarm)
model <- stan_glm(mpg ~ wt + gear, data = mtcars, chains = 2, iter = 200, refresh = 0)
describe_posterior(model, ci_method = "SI", test = "bf")
#> Computation of Bayes factors: sampling priors, please wait...
#> Loading required namespace: logspline
#> # Description of Posterior Distributions
#>
#> Parameter Median BF SI_low SI_high BF Rhat ESS
#> (Intercept) 39.064 1 28.91 49.11 9.72e+08 0.997 164
#> wt -5.450 1 -6.92 -3.59 6.05e+02 0.998 170
#> gear -0.405 1 -2.48 1.72 4.76e-02 0.997 166
Created on 2020-01-03 by the reprex package (v0.3.0)
is now this:
library(bayestestR)
library(rstanarm)
model <- stan_glm(mpg ~ wt + gear, data = mtcars, chains = 2, iter = 200, refresh = 0)
describe_posterior(model, ci_method = "SI", test = "bf")
#> Computation of Bayes factors: sampling priors, please wait...
#> Loading required namespace: logspline
#> # Description of Posterior Distributions
#>
#> Parameter Median SI SI_low SI_high BF Rhat ESS
#> (Intercept) 40.102 1 27.57 50.15 2.02e+06 0.994 152
#> wt -5.500 1 -7.26 -3.54 1.22e+03 0.995 131
#> gear -0.444 1 -2.62 1.84 7.82e-02 0.995 148
Created on 2020-01-03 by the reprex package (v0.3.0)
https://www.bayesianspectacles.org/the-support-interval/
SIBF > 1 [SI_low, SI_high]: the interval of which the parameter has a BF > 1 (in which there is support for the alternative). Not sure how computationally heavy this would be. Not sure also if more appropriately named Support Interval (SI) or Confirmation Interval (CI) (the latter might add up to the confusion related to the existence of different CIs, mainly for the sake of our habits of writing CI?)