Closed DominiqueMakowski closed 4 years ago
Marsman's paper seems very interesting (unfortunately I didn't manage to through the wall of equations 😞)
If I understand, the only existing alternative for Bayesian regressions (currently in the work in this package) is a smart standardization of the posterior that would approximate (partial) correlations or some standardized differences?
@mattansb I've implemented some approximation to test stat for Bayesian models. Not sure if it makes a lot of sense from a theoretical perspective but seems to work from an applied one 😅
https://easystats.github.io/effectsize/articles/bayesian_models.html
The idea is to get t from coef and posterior SD, find the frequentist DoF (😱) and get the r posterior... Seems almost to easy
I it's time for a dev branch so we can discuss commits...
👍
dev is up
Okay, so I'll split my comment into de facto and de jure:
You are doing:
se <- sd(posterior)
t <- posterior / se
df <- from_freq_magic(...)
convert_t_to_r(t, df)
This will probably almost always give results very close to the true results....
How does this function deal with multiple dfs (I can't seem to re-build the package to check)? Like in mixed models? If it cannot return the approx-correct df, the calculation will be not be a good approx...
The idea of degrees of freedom in a Bayesian framework is almost heresy! That is - df represent the ~"number of bits of data free to change that can still result in this estimate". But in Bayes the estimates aren't based only on the data - but also on the prior (and thus also on previous data, and so forth). (Also there isn't 1 estimate, but a whole distribution.)
So in the most technical sense, this is wrong 😅
But see de facto above 😉
How does this function deal with multiple dfs
Here lies the problem. Currently, I added this function in parameters, and for mixed models the most straigthforward and quick method is to 1) refit the model using lme4 (see here) and 2) use the kenward-roger approximations (same as for p values for lme4). It's clearly not the best way to tackle this.
The idea of degrees of freedom in a Bayesian framework is almost heresy!
Haha I'd bet this hybrid approach would trigger all the bayestremists 😅 As a pragmaticist I'd tend to say "as long as it serves better science" but it's true that best would be to also be theoretically correct... But as we said, is there currently any better alternatives out there?
Another thing I am wondering is for non-linear models, e.g., logistic regression. The freq models usually report a z value. But as we only have access to the sample SD for the posteriors (same as for linear model), does it make sense to still compute t and then t to r or should we use z values?
Another thing I am wondering is for non-linear models
For these models there's not need for all this "black magic" because they have OR / log odds / rate etc. as the model parameters! Actually, this problem of effect sizes is only a problem in linear models (because you need to estimate the variance explained and the error variance).
is there currently any better alternatives out there?
posterior_predict
used like this can give some form of variance explained, but it's not really partialized (so it's interpretation is not straightforward...)
For these models there's not need for all this "black magic" because they have OR / log odds / rate etc. as the model parameters!
Forgot about that haha
posterior_predict used like this can give some form of variance explained
Interesting... but not quite there yet 😞
library(parameters)
library(effectsize)
#>
#> Attaching package: 'effectsize'
#> The following objects are masked from 'package:parameters':
#>
#> cohens_f, epsilon_squared, eta_squared, omega_squared
model <- glm(vs ~ cyl + disp + drat, data = mtcars, family = "binomial")
parameters <- model_parameters(model)
#> Waiting for profiling to be done...
parameters$r <- convert_z_to_r(parameters$z, n = insight::n_obs(model))
parameters$r_from_odds <- convert_odds_to_r(parameters$Coefficient, log = TRUE)
parameters
#> Parameter | Coefficient | SE | 95% CI | z | df | p | r | r_from_odds
#> ---------------------------------------------------------------------------------------------
#> (Intercept) | 25.09 | 11.88 | [ 6.43, 56.94] | 2.11 | 28 | < .05 | 0.35 | 0.99
#> cyl | -1.99 | 1.05 | [-4.57, -0.13] | -1.89 | 28 | 0.06 | -0.32 | -0.48
#> disp | -0.01 | 0.02 | [-0.06, 0.02] | -0.47 | 28 | > .1 | -0.08 | 0.00
#> drat | -3.18 | 2.16 | [-8.70, 0.49] | -1.47 | 28 | > .1 | -0.25 | -0.66
Created on 2019-10-13 by the reprex package (v0.3.0)
The two "r" do not match tho... any idea why?
I think it's because in a multiple regression setting, z and log odds don't directly correspond.
In any case r doesn't make any sense in a non-Gaussian model. All of these "just because you can doesn't mean it makes sense" uses should be documented somewhere... Maybe drop convert_odds_to_r
altogether? What users would probably want is to standardize coeffs here...
Also convert_F_to_r
usually won't make sense - if df > 1
it would be multiple R, usually reported as R2, and then it's just partial eta squared.
Also also... The file names and contents for all these functions needs to be make concise - it took me way too long to find convert_z_to_r
and convert_odds_to_r
!
Also also also, @DominiqueMakowski this thread is about Bayes!
z and log odds don't directly correspond
That's strange... Which one gives the most "accurate" r/d? For instance in a metaanalysis if you want to convert everything to d, it's kinda problematic if two approaches give quite different values...
In any case r doesn't make any sense in a non-Gaussian model.
If you want to interpret it as variance explained sure, but it makes sense if you just want to have all your effects on the same scales... and possibility + good documentation > impossibility (as we discussed several times ^^)
it would be multiple R, usually reported as R2, and then it's just partial eta squared.
Since eta squared is equivalent to R2, which is a more generic name ("Eta Squared, however, is used specifically in ANOVA models.", here), wouldn't it be clearer to rename all those as t_to_r2 etc. (it looks better too)?
The file names and contents for all these functions needs to be make concise
Agreed
this thread is about Bayes!
U started! 😁
Which one gives the most "accurate" r/d?
For d, the odds to d looks like the correct one.
but it makes sense if you just want to have all your effects on the same scales...
It "makes sense" to want that, but it doesn't make sense to get that in this method - you should instead use parameters_standardize
.
wouldn't it be clearer to rename all those as t_to_r2 etc
I think it's a great idea to have !
F_to_partial_R2 <- F_to_partial_eta_squared
t_to_partial_R2 <- t_to_partial_eta_squared
Then the convert_t_to_r
could be:
convert_t_to_partial_r <- function(t, df_error) sign(t) * sqrt(t_to_partial_R2(t, df_error))
(I think it is important to have the "partial" in the names or at the very least the doc title - users shouldn't think they're getting the simple correlation.)
Another thing I am wondering is for non-linear models, e.g., logistic regression.
You stated!
Going back to:
I've implemented some approximation to test stat for Bayesian models. Not sure if it makes a lot of sense from a theoretical perspective but seems to work from an applied one 😅
I said that de facto...
This will probably almost always give results very close to the true results....
I'd like to amend that: It will work in cases where the prior is relatively lax (i.e., not strong); in these cases SDposterior≈ SEfrequentist (because the posterior ≈ the likelihood function). But when this equality will be broken with strong priors, where SDposterior< SEfrequentist, which will lead to miss-estimating the effect size.
posterior_predict used like this can give some form of variance explained
Which is implemented in performance::icc()
:-)
@strengejacke interesting.... Care to take a stand at that here for gaussian models? (This will give a non-partial R2, if I understand correctly?)
@DominiqueMakowski I suggest moving the related functions and vignette to a separate branch, as they don't yet fully work / ongoing debate here about how to do this, and It is possible at this time to get regular beta (standardized coefficients) for Bayesian models.
Yes I agree, or rename it as something WIP
@DominiqueMakowski What is left to do here?
Standardized regression coeffs are available... Are we talking about an Eta2-like effect size?
I am talking about a consistent and robust method to get a unified ("similar") index for different kinds of models (regressions with factors, interactions, and logistic models) 😬
Okay, I've come up with an idea.
library(rstanarm)
library(effectsize)
eta_squared_posterior <- function(model, partial = FALSE, nsamps = 1000, verbose = TRUE, ...) {
if (!insight::model_info(model)$is_linear) {
stop("Only applicable to linear models")
}
if (partial) {
warning("Only support non partial PVE.")
}
# get ppd
ppd <- rstantools::posterior_predict(model)
nsamps <- min(nsamps, nrow(ppd))
i <- sample(nrow(ppd), size = nsamps)
ppd <- ppd[i,]
# get model data
f <- insight::find_formula(model)$conditional
X <- insight::get_predictors(model)
resp_name <- insight::find_response(model)
if (verbose) {
message("Sampleing effect size... This can take a while...")
}
res <- apply(ppd, 1, function(r) {
# sampled outcome + predictors
temp_dat <- X
temp_dat[[resp_name]] <- r
# fit a simple linear model
temp_fit <- lm(f, temp_dat)
# compute effect size
ANOVA <- car::Anova(temp_fit, type = 3)
es <- eta_squared(ANOVA, ci = NA, partial = FALSE)
data.frame(t(setNames(es$Eta_Sq, es$Parameter)), check.names = F)
})
res <- do.call("rbind", res)
return(res)
}
model <- stan_lmer(mpg ~ wt + qsec * factor(am) + (1|cyl),
data = mtcars, refresh = 0)
pp_eta2 <- eta_squared_posterior(model)
#> Sampleing effect size... This can take a while...
bayestestR::describe_posterior(pp_eta2)
#> # Description of Posterior Distributions
#>
#> Parameter | Median | 89% CI | pd | 89% ROPE | % in ROPE
#> ----------------------------------------------------------------------------
#> wt | 0.408 | [0.178, 0.642] | 1 | [-0.100, 0.100] | 0.000
#> qsec | 0.105 | [0.000, 0.276] | 1 | [-0.100, 0.100] | 53.984
#> factor(am) | 0.024 | [0.000, 0.100] | 1 | [-0.100, 0.100] | 100.000
#> qsec:factor(am) | 0.036 | [0.000, 0.133] | 1 | [-0.100, 0.100] | 92.368
## Compare to:
library(magrittr)
lm(mpg ~ wt + qsec * factor(am), data = mtcars) %>%
car::Anova(type = 3) %>%
eta_squared(partial = FALSE)
#> Parameter | Eta2 | 90% CI
#> -------------------------------------
#> wt | 0.44 | [0.21, 0.60]
#> qsec | 0.08 | [0.00, 0.27]
#> factor(am) | 0.05 | [0.00, 0.21]
#> qsec:factor(am) | 0.07 | [0.00, 0.24]
Created on 2020-05-25 by the reprex package (v0.3.0)
For now, this only works with non-partial eta squared, so it gives the total (unique) variance explained. For now, I only use a sub-sample of the ppd to speed it up.
you can find it here: https://github.com/easystats/effectsize/blob/dev/WIP/eta_squared_posterior.R
finally some good news on this front, that looks super promissive!
So let me rephrase to see if I understood correctly:
Let's say you have a model y ~ x
with 10 obs.
y
, you get the posterior distribution (of for instance 200 samples) of the predictionExactly!
I will have to think if there is any way to make it faster, as it is currently pretty slow (working across each sample...).
A collection of stan threads that seem to support the logic of my method:
(I also asked here if my method is applicable. Waiting for a response...)
Can i use bayestestR to test the regression results of the Bayesian quantile model ?
Hi @emmawang123123 - I suggest opening a new issue (maybe on the bayestestR repo) with some more details (:
thank you,let me try
------------------ 原始邮件 ------------------ 发件人: "Mattan S. Ben-Shachar"<notifications@github.com>; 发送时间: 2020年5月30日(星期六) 凌晨1:18 收件人: "easystats/effectsize"<effectsize@noreply.github.com>; 抄送: "西丽西亚"<1635823199@qq.com>;"Mention"<mention@noreply.github.com>; 主题: Re: [easystats/effectsize] Effect sizes in Bayesian regressions (#8)
Hi @emmawang123123 - I suggest opening a new issue (maybe on the bayestestR repo) with some more details (:
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Opening a new thread as we strayed from the topic of #7
Originally posted by @mattansb in https://github.com/easystats/effectsize/issues/7#issuecomment-541341954