easystats / bayestestR

:ghost: Utilities for analyzing Bayesian models and posterior distributions
https://easystats.github.io/bayestestR/
GNU General Public License v3.0
576 stars 55 forks source link

P-Direction in zero-inflated models #616

Closed tobiasweckop closed 1 year ago

tobiasweckop commented 1 year ago

Hello,

i would like to calculate the p-direction of a marginal density that is discontinuous with a large amount of probability mass at exactly zero.

This leads to two problems: (1) The median is exactly zero, so its unclear whether to use the area to the left or right of zero. (2) A significant part of the probability mass is exactly zero and neither option from (1) would capture it in calculations.

Currently, the pd() function returns p-direction values below 0.5, violating its theoretical range of 0.5 to 1.

My question is: Is there any way to solve these two problems without significantly changing the way that the p-direction can be interpreted?

mattansb commented 1 year ago

This is actually okay. The actual definition of the pd is:

$$ max(p(x < 0), p(x > 0)) $$

That is, the maximum aposteriori probably that x has a sign.

Currently, the docs reflect only cases in which x is continuous, which is the cause for confusion. I will update the docs to reflect the possible values and interpretation for cases with discrete (or mixture with discrete) values.

tobiasweckop commented 1 year ago

That makes sense but in this case the formula for conversion of a p-direction into a two-sided frequentist p-value does not work anymore, since by the formula 2 * (1 - p_direction) we would reach values larger than 1 if p_direction < 0.5.

Is there any way to fix this calculation in my scenario to get the equivalency of the p-direction with frequentist p-values back?

mattansb commented 1 year ago

I'm not sure if or how the pd to p conversion is valid if there are discrete 0s.

Do you have a reproducible example you can share?

tobiasweckop commented 1 year ago

It is hard to provide an example without posting lots of code but what i am trying to do is compare the inferences made by Bayesian Model Averaging of linear regression models with frequentist methods. As it is a known problem with model selection techniques such as Stepwise Regression that p-values are systematically underestimated, i would like to examine if BMA is more robust in this regard. The p-direction seems very fitting for this question, as it translates pretty well into frequentist p-values.

Right now i am just creating data following a linear model y <- X %*% Beta + rnorm(0,1) with some positive beta coefficients and some betas equal to zero. I am giving this to the BMS package and looking at the marginal densities it returns.

Now the problem is that the coefficient plot is only accurate if the top models considered by BMA actually include this predictor. Since this is almost never the case for all variables, i have used their posterior inclusion probabilities to inflate the model with zeroes, such that the marginal density now also includes information about those models, that did not include the predictor at all. This was done by sampling 1.000.000 times from the marginal distribution and then replacing a portion of the sample with zeroes. The proportion of zeroes is equal to the posterior probability that was given to those models, that do not contain the regressor in question.

Depending on how important the BMA Ensemble considers a predictor to be, the coefficient plots look like either one of the following plots. These plots are the zero-inflated marginal densities of each beta averaged across multiple simulation runs to make them more robust against sample variations.

7a8b999e-98e3-4fb5-8876-a164b8f1b947 0944056e-2f5e-49d9-841b-2fdb34bc0498 4e134d94-6309-46e1-8c3b-cc1378544f01

Intuitively, the last plot suggests a very high probability that the predictor variable is zero (or very close) and so i would like to have a p-value like metric to reflect this idea.

mattansb commented 1 year ago

Two points:

  1. Once you discretize the parameter space like this, the pd-to-p conversion is not valid.

See example:

library(MuMIn)
library(brms)
library(bayestestR)

set.seed(1234)
d <- tibble::tibble(
  x = rnorm(100),
  y = rnorm(100, x, sd = 6)
)

## Freq ---------------------------------------------------

m0 <- lm(y ~ 1, d)
m1 <- lm(y ~ 1 + x, d)

ma <- model.avg(m0, m1)

## Bayes ---------------------------------------------------

m0b <- brm(y ~ 1, d, backend = "cmdstanr", cores = 4)
m1b <- brm(y ~ 1 + x, d, backend = "cmdstanr", cores = 4)

mab <- posterior_average(m1b, m0b, missing = list(b_x = 0))

## Compare ------------------------------------------------

### Estimate

apply(mab[1:2], 2, mean)
#> b_Intercept         b_x 
#>   0.1567762   0.3292163 
coef(ma, full = TRUE)
#> (Intercept)           x 
#>   0.1523379   0.3932120 

### P vs PD

(pd1 <- p_direction(m1b, parameters = "x"))
#> Probability of Direction 
#> 
#> Parameter |     pd
#> ------------------
#> x         | 91.03%
pd_to_p(pd1[["pd"]])
#> [1] 0.1795
summary(m1)[["coefficients"]]
#>              Estimate Std. Error   t value  Pr(>|t|)
#> (Intercept) 0.2229243  0.6298725 0.3539198 0.7241593
#> x           0.8434903  0.6226555 1.3546662 0.1786388

(pda <- p_direction(mab["b_x"]))
#> Probability of Direction
#> 
#> Parameter |     pd
#> ------------------
#> b_x       | 35.68%
summary(ma)[["coefmat.full"]]
#>              Estimate Std. Error Adjusted SE   z value  Pr(>|z|)
#> (Intercept) 0.1523379  0.6306727   0.6384294 0.2386136 0.8114052
#> x           0.3932120  0.5981554   0.6019445 0.6532364 0.5136039
pd_to_p(pda[["pd"]])
#> Probability of Direction: 0.71

Also, a gentle reminder that p-values cannot be used to support the null. However...

  1. When you discretize the parameter space like this, pd can be used to support the null! If pd << 0.5, the is more evidence that the parameter does not have a direction. You can read more about this in this paper: https://doi.org/10.1177/2515245921992035

To do:

DominiqueMakowski commented 1 year ago

Yes, we should remove the definition to "of the median's direction".

Note that during a PR to add pd to Julia/Turing, one of the dev's came up with a much more efficient algorithm than the one-liner equivalent that we have in bayestestR

(though it doesn't look like it):

function p_direction(x::AbstractArray{<:Union{Missing,Real}})
    ntotal = 0
    npositive = 0
    nnegative = 0
    for xi in x
        if xi > 0
            npositive += 1
        elseif xi < 0
            nnegative += 1
        end
        ntotal += 1
    end
    return max(npositive, nnegative) / ntotal
end

We might run a quick benchmark to see if it's also more efficient.

we discussed this in the past, seems like we do need an error here).

I'm alright with that

We can mention a weird theoretical caveat: let say a posterior has like 999 values of 0 and one 0.1, then the pd using the formula above will be 100%. But that kind of weirdness should be spottable by inspecting the proportion of exactly 0 values

mattansb commented 1 year ago

The Julia solution is less efficient in R (probably because R doesn't have the += operator?), but I found a solution that is marginally more efficient to the one we currently have:

pd_classic <- function(x, null = 0) {
  # Current algo
  max(c(length(x[x > null])/length(x), length(x[x < null])/length(x)))  
}

pd_Julia <- function(x, null = 0) {
  # Julia algo
  ntotal <- 0
  npositive <- 0
  nnegative <- 0
  for (xi in x) {
    if (xi > null) {
      npositive <- npositive + 1  
    } else if (xi < null) {
      nnegative <- nnegative + 1  
    }
    ntotal <- ntotal + 1
  }
  return(max(npositive, nnegative) / ntotal)
}

pd_new <- function(x, null = 0) {
  # New algo?
  max(sum(x > null), sum(x < null)) /length(x)
}

# continuous
microbenchmark::microbenchmark(
  pd_classic(rnorm(10000)),
  pd_Julia(rnorm(10000)),
  pd_new(rnorm(10000)), 

  setup = set.seed(111)
)
#> Unit: microseconds
#>                      expr   min     lq    mean median    uq   max neval cld
#>  pd_classic(rnorm(10000)) 450.0 464.60 495.095 473.35 537.3 671.1   100 a  
#>    pd_Julia(rnorm(10000)) 727.8 800.45 817.058 817.70 828.8 884.4   100  b 
#>      pd_new(rnorm(10000)) 369.1 403.15 420.443 409.85 440.6 520.0   100   c

# discrete
microbenchmark::microbenchmark(
  pd_classic(rpois(10000, 0.4)),
  pd_Julia(rpois(10000, 0.4)),
  pd_new(rpois(10000, 0.4)), 

  setup = set.seed(111)
)
#> Unit: microseconds
#>                           expr   min     lq    mean median     uq   max neval cld
#>  pd_classic(rpois(10000, 0.4)) 178.7 187.80 190.381  188.7 190.40 246.4   100 a  
#>    pd_Julia(rpois(10000, 0.4)) 493.2 520.35 528.264  525.8 532.90 599.0   100  b 
#>      pd_new(rpois(10000, 0.4)) 160.0 170.80 173.672  172.5 173.95 210.8   100   c

Also,

We can mention a weird theoretical caveat: let say a posterior has like 999 values of 0 and one 0.1, then the pd using the formula above will be 100%. But that kind of weirdness should be spottable by inspecting the proportion of exactly 0 values

The current solution and the Julia solution would give $pd = 0.001$:

x <- c(rep(0, 999), 0.1)

bayestestR::p_direction(x)
#> Probability of Direction: 1.00e-03
pd_Julia(x)
#> [1] 0.001
pd_new(x)
#> [1] 0.001
mattansb commented 1 year ago

(Why would a loop be faster than a vectorized operation in Julia? How odd....)

tobiasweckop commented 1 year ago

Your answers were very helpful and the paper is also very interesting.

Thank you very much!

DominiqueMakowski commented 1 year ago

The current solution and the Julia solution would give .001

right yeah I brainfroze