Closed mattansb closed 4 years ago
What is this sorcery, normally the pd is calibrated on the median's side, and the median is on the size where there is more than 50% 🤔
Ah, but the median can be exactly 0!
library(bayestestR)
library(rstanarm)
x <- rnorm(30)
y <- rnorm(30)
dat <- data.frame(x,y)
m0 <- stan_glm(y ~ 1, dat, family = gaussian(),
refresh = 0,
diagnostic_file = file.path(tempdir(), "df0.csv"))
m1 <- update(m0, formula. = . ~ . + x,
diagnostic_file = file.path(tempdir(), "df1.csv"))
wp <- weighted_posteriors(m0, m1)
# checked by pd
length(wp$x[wp$x < 0]) / length(wp$x)
#> [1] 0.01875
length(wp$x[wp$x > 0]) / length(wp$x)
#> [1] 0.08075
# not checked by pd
length(wp$x[wp$x == 0]) / length(wp$x)
#> [1] 0.9005
Created on 2020-04-20 by the reprex package (v0.3.0)
bayestestR::pd(c(-1, 0, 1))
#> pd = 33.33%
bayestestR::pd(c(-1, 0, 0, 1))
#> pd = 25.00%
bayestestR::pd(c(-1, 0, 0, 0, 1))
#> pd = 20.00%
Created on 2020-04-20 by the reprex package (v0.3.0)
(To be clear, I don't think the behavior of the function should change!)
lol thanks Mat for these examples ad absurdum, yeah the only way would be to include 0 in which goes against what the pd is
I don't think this is that absurd 😅 I actually quite like it - this is an "edge case" that allows the pd
to accept the (point) null!
See this paper https://doi.org/10.3758/s13423-017-1420-7 (especially the Unification section).
I also like this paper a lot https://doi.org/10.31234/osf.io/h6pr8
Another thing that happens when 0 is the median - pd_to_p()
will not actually correspond to the p-value....
x <- c(rep(0,1000),rnorm(2000))
plot(bayestestR::hdi(x))
(pd <- bayestestR::pd(x))
#> pd = 33.47%
bayestestR::pd_to_p(pd)
#> pd = 133.07%
# should be:
2 * (1 - pmax(pd, 1 - pd))
#> pd = 66.93%
Created on 2020-04-20 by the reprex package (v0.3.0)
Also, I think the input to pd_to_p()
should be unclass()
-ed, no?
> library(bayestestR)
> x <- c(rep(0,1000),rnorm(2000))
> bayestestR::pd(x)
pd = 34.10%
> bayestestR::pd(x, method="logspline")
pd = 53.06%
> bayestestR::pd(x, method="KernSmooth")
Loading required namespace: KernSmooth
pd = 50.41%
Note that this edge case is caused by the discretization of the parameter space underlying the default method, but it seems like methods based on the density do not suffer from this issue
More formally, when the mode is zero, the default-method for pd()
yields a value < .5.
library(bayestestR)
x <- c(rep(0,1000),rnorm(2000))
sjmisc::typical_value(x, "mode")
#> [1] 0
pd(x)
#> pd = 35.07%
Created on 2020-04-20 by the reprex package (v0.3.0)
I think that when the median is 0, the "direct"
is the method to go with, as it doesn't erase the fact the the 0s aren't "small values", but exactly 0 (they have a "discrete" meaning - again, I highly recommend those paper 👆).
More formally, when the mode is zero, the default-method for pd() yields a value < .5.
I think formally its when the median is 0, no?
so you'd argue against for instance switching the method if median=0? Or maybe throwing a message suggesting to change the method in such a potential edgecase is detected?
I think formally its when the median is 0, no?
True.
x <- c(-5, -5, -5, -5, -4, -4, -4, -3, -3, -3, -1, -1, -2)
x <- c(x, abs(x))
x <- c(x, 0)
sjmisc::typical_value(x, "mode")
#> [1] -5
median(x)
#> [1] 0
bayestestR::pd(x)
#> pd = 48.15%
Created on 2020-04-20 by the reprex package (v0.3.0)
so you'd argue against for instance switching the method if median=0?
I'd suggest that, too, as the pd()
perfectly reflects the definition of the median, and thus the result is not incorrect (but highly unlikely for posterior samples (from models) with many draws).
I would not change the behavior of pd()
- as Daniel said, as it currently is, it properly reflects the Median -> pd relationship for these cases, regardless of how unlikely they are in "the wild".
So mainly we have to think about https://github.com/easystats/bayestestR/issues/304#issuecomment-616346510?
Yup
pd
documentation states thatHowever, when exactly 0 has high credibility, it is actually possible to go below 50%. This can happen when using model averaged posteriors (out
weighted_posteriors
orbrms::posterior_average
).I think this case should be addressed in the function docs.
Created on 2020-04-20 by the reprex package (v0.3.0)