Closed tobiasweckop closed 1 year ago
This is actually okay. The actual definition of the pd is:
$$ max(p(x < 0), p(x > 0)) $$
That is, the maximum aposteriori probably that x has a sign.
Currently, the docs reflect only cases in which x is continuous, which is the cause for confusion. I will update the docs to reflect the possible values and interpretation for cases with discrete (or mixture with discrete) values.
That makes sense but in this case the formula for conversion of a p-direction into a two-sided frequentist p-value does not work anymore, since by the formula 2 * (1 - p_direction) we would reach values larger than 1 if p_direction < 0.5.
Is there any way to fix this calculation in my scenario to get the equivalency of the p-direction with frequentist p-values back?
I'm not sure if or how the pd to p conversion is valid if there are discrete 0s.
Do you have a reproducible example you can share?
It is hard to provide an example without posting lots of code but what i am trying to do is compare the inferences made by Bayesian Model Averaging of linear regression models with frequentist methods. As it is a known problem with model selection techniques such as Stepwise Regression that p-values are systematically underestimated, i would like to examine if BMA is more robust in this regard. The p-direction seems very fitting for this question, as it translates pretty well into frequentist p-values.
Right now i am just creating data following a linear model y <- X %*% Beta + rnorm(0,1) with some positive beta coefficients and some betas equal to zero. I am giving this to the BMS package and looking at the marginal densities it returns.
Now the problem is that the coefficient plot is only accurate if the top models considered by BMA actually include this predictor. Since this is almost never the case for all variables, i have used their posterior inclusion probabilities to inflate the model with zeroes, such that the marginal density now also includes information about those models, that did not include the predictor at all. This was done by sampling 1.000.000 times from the marginal distribution and then replacing a portion of the sample with zeroes. The proportion of zeroes is equal to the posterior probability that was given to those models, that do not contain the regressor in question.
Depending on how important the BMA Ensemble considers a predictor to be, the coefficient plots look like either one of the following plots. These plots are the zero-inflated marginal densities of each beta averaged across multiple simulation runs to make them more robust against sample variations.
Intuitively, the last plot suggests a very high probability that the predictor variable is zero (or very close) and so i would like to have a p-value like metric to reflect this idea.
Two points:
See example:
library(MuMIn)
library(brms)
library(bayestestR)
set.seed(1234)
d <- tibble::tibble(
x = rnorm(100),
y = rnorm(100, x, sd = 6)
)
## Freq ---------------------------------------------------
m0 <- lm(y ~ 1, d)
m1 <- lm(y ~ 1 + x, d)
ma <- model.avg(m0, m1)
## Bayes ---------------------------------------------------
m0b <- brm(y ~ 1, d, backend = "cmdstanr", cores = 4)
m1b <- brm(y ~ 1 + x, d, backend = "cmdstanr", cores = 4)
mab <- posterior_average(m1b, m0b, missing = list(b_x = 0))
## Compare ------------------------------------------------
### Estimate
apply(mab[1:2], 2, mean)
#> b_Intercept b_x
#> 0.1567762 0.3292163
coef(ma, full = TRUE)
#> (Intercept) x
#> 0.1523379 0.3932120
### P vs PD
(pd1 <- p_direction(m1b, parameters = "x"))
#> Probability of Direction
#>
#> Parameter | pd
#> ------------------
#> x | 91.03%
pd_to_p(pd1[["pd"]])
#> [1] 0.1795
summary(m1)[["coefficients"]]
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.2229243 0.6298725 0.3539198 0.7241593
#> x 0.8434903 0.6226555 1.3546662 0.1786388
(pda <- p_direction(mab["b_x"]))
#> Probability of Direction
#>
#> Parameter | pd
#> ------------------
#> b_x | 35.68%
summary(ma)[["coefmat.full"]]
#> Estimate Std. Error Adjusted SE z value Pr(>|z|)
#> (Intercept) 0.1523379 0.6306727 0.6384294 0.2386136 0.8114052
#> x 0.3932120 0.5981554 0.6019445 0.6532364 0.5136039
pd_to_p(pda[["pd"]])
#> Probability of Direction: 0.71
Also, a gentle reminder that p-values cannot be used to support the null. However...
To do:
Yes, we should remove the definition to "of the median's direction".
Note that during a PR to add pd to Julia/Turing, one of the dev's came up with a much more efficient algorithm than the one-liner equivalent that we have in bayestestR
(though it doesn't look like it):
function p_direction(x::AbstractArray{<:Union{Missing,Real}})
ntotal = 0
npositive = 0
nnegative = 0
for xi in x
if xi > 0
npositive += 1
elseif xi < 0
nnegative += 1
end
ntotal += 1
end
return max(npositive, nnegative) / ntotal
end
We might run a quick benchmark to see if it's also more efficient.
we discussed this in the past, seems like we do need an error here).
I'm alright with that
We can mention a weird theoretical caveat: let say a posterior has like 999 values of 0
and one 0.1
, then the pd using the formula above will be 100%. But that kind of weirdness should be spottable by inspecting the proportion of exactly 0 values
The Julia solution is less efficient in R (probably because R doesn't have the +=
operator?), but I found a solution that is marginally more efficient to the one we currently have:
pd_classic <- function(x, null = 0) {
# Current algo
max(c(length(x[x > null])/length(x), length(x[x < null])/length(x)))
}
pd_Julia <- function(x, null = 0) {
# Julia algo
ntotal <- 0
npositive <- 0
nnegative <- 0
for (xi in x) {
if (xi > null) {
npositive <- npositive + 1
} else if (xi < null) {
nnegative <- nnegative + 1
}
ntotal <- ntotal + 1
}
return(max(npositive, nnegative) / ntotal)
}
pd_new <- function(x, null = 0) {
# New algo?
max(sum(x > null), sum(x < null)) /length(x)
}
# continuous
microbenchmark::microbenchmark(
pd_classic(rnorm(10000)),
pd_Julia(rnorm(10000)),
pd_new(rnorm(10000)),
setup = set.seed(111)
)
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> pd_classic(rnorm(10000)) 450.0 464.60 495.095 473.35 537.3 671.1 100 a
#> pd_Julia(rnorm(10000)) 727.8 800.45 817.058 817.70 828.8 884.4 100 b
#> pd_new(rnorm(10000)) 369.1 403.15 420.443 409.85 440.6 520.0 100 c
# discrete
microbenchmark::microbenchmark(
pd_classic(rpois(10000, 0.4)),
pd_Julia(rpois(10000, 0.4)),
pd_new(rpois(10000, 0.4)),
setup = set.seed(111)
)
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> pd_classic(rpois(10000, 0.4)) 178.7 187.80 190.381 188.7 190.40 246.4 100 a
#> pd_Julia(rpois(10000, 0.4)) 493.2 520.35 528.264 525.8 532.90 599.0 100 b
#> pd_new(rpois(10000, 0.4)) 160.0 170.80 173.672 172.5 173.95 210.8 100 c
Also,
We can mention a weird theoretical caveat: let say a posterior has like 999 values of 0 and one 0.1, then the pd using the formula above will be 100%. But that kind of weirdness should be spottable by inspecting the proportion of exactly 0 values
The current solution and the Julia solution would give $pd = 0.001$:
x <- c(rep(0, 999), 0.1)
bayestestR::p_direction(x)
#> Probability of Direction: 1.00e-03
pd_Julia(x)
#> [1] 0.001
pd_new(x)
#> [1] 0.001
(Why would a loop be faster than a vectorized operation in Julia? How odd....)
Your answers were very helpful and the paper is also very interesting.
Thank you very much!
The current solution and the Julia solution would give .001
right yeah I brainfroze
Hello,
i would like to calculate the p-direction of a marginal density that is discontinuous with a large amount of probability mass at exactly zero.
This leads to two problems: (1) The median is exactly zero, so its unclear whether to use the area to the left or right of zero. (2) A significant part of the probability mass is exactly zero and neither option from (1) would capture it in calculations.
Currently, the pd() function returns p-direction values below 0.5, violating its theoretical range of 0.5 to 1.
My question is: Is there any way to solve these two problems without significantly changing the way that the p-direction can be interpreted?