easystats / bayestestR

:ghost: Utilities for analyzing Bayesian models and posterior distributions
https://easystats.github.io/bayestestR/
GNU General Public License v3.0
563 stars 55 forks source link

"LogBF = Inf" problem #624

Closed Masharipov closed 11 months ago

Masharipov commented 1 year ago

Thanks for the useful package!

I was used bayestestR to compare sensetivity (TPR) of different methods (M1,M2,M3,M4,M5) averaged over the levels of SNR. This is the code:

contrasts(Data$Method) <- contr.equalprior_pairs contrasts(Data$SNR) <- contr.equalprior_pairs

fit_bayes <- stan_glm( TPR_prc ~ Method + SNR, data = Data, iter = 5000, warmup = 1000) pairs_bayes <- pairs(emmeans(fit_bayes, ~Method)) postprob <- describe_posterior(pairs_bayes,rope_range = c(-1,1)) bf <- bayesfactor_parameters(pairs_bayes,fit_bayes ,null=c(-1,1)) postprob$LogBF = bf$log_BF

My data: Data.csv

The problem is that I get LogBF = Inf for pairwise comparisons: image

I understand that the posterior distributions are very narrow and BF-->Inf. The maximum floating-point number is 1.797693e+308. Therefore, the maximum possible logBF is 709.7827. However, JASP can report logBF > 709.7827 (e.g., logBF ≈ 4000-7000 for my data).

I as far as I understand, bayestestR first calculate BF and then takes log(BF). In contrast, JASP first calculate logBF and then calculate BF = exp(logBF). Maybe this capability could be added to the bayestest R package?

mattansb commented 11 months ago

This issue is due to {logspline} not supporting the return of log-densities. I have sent the maintainer a feature request. If they will not respond, I will build this support into bayestestR.

mattansb commented 11 months ago

From the maintainer of logspline:

I can do this. It will probably be another week or two before I have enough time to do this carefully, update helpfiles etc.


Once this is implemented, I will make the required changes in bayestestR.

Masharipov commented 11 months ago

Thanks!

mattansb commented 11 months ago

Leaving this open so I don't about it.

mattansb commented 11 months ago

Okay, so this actually wasn't an issue in {logspline} (that's what happens when I read github issues on my phone...).

The "issue" is that the posterior probability for the defined null interval is infinitesimal - out of 16,000 posterior samples, not 1 falls within the null interval (except for M2 - M3 and M4 - M5):

library(bayestestR)
library(emmeans)

Data <- read.csv("C:\\Users\\user\\Downloads\\Data.csv", sep = ";")
Data$Method <- C(factor(Data$Method), contr.equalprior_pairs)
Data$SNR <- C(factor(Data$SNR), contr.equalprior_pairs)

fit_bayes <- stan_glm(TPR_prc ~ Method + SNR,
                      data = Data,
                      iter = 5000, warmup = 1000, cores = 4)

pairs_bayes <- pairs(emmeans(fit_bayes, ~ Method))

# how many samples in the [-1, +1] interval?
insight::get_parameters(pairs_bayes) |> 
  sapply(\(x) sum(-1 < x & x < 1))
#> M1 - M2 M1 - M3 M1 - M4 M1 - M5 M2 - M3 M2 - M4 M2 - M5 M3 - M4 M3 - M5 M4 - M5 
#>       0       0       0       0   16000       0       0       0       0   16000 

Not only that, but the smallest (of 16,000 samples!) is quite far from the null interval:

insight::get_parameters(pairs_bayes) |> 
  sapply(max)
#>     M1 - M2     M1 - M3     M1 - M4     M1 - M5     M2 - M3     M2 - M4     M2 - M5     M3 - M4     M3 - M5     M4 - M5 
#>  -2.4825258  -2.2830532 -29.6783006 -29.3708621   0.6660124 -26.8417212 -26.4680872 -27.0671175 -26.7161429   0.7777120 

And overall the whole mass is in a narrow band:

describe_posterior(pairs_bayes, test = NULL)
#> Summary of Posterior Distribution
#> 
#> Parameter | Median |           95% CI
#> -------------------------------------
#> M1 - M2   |  -2.90 | [ -3.11,  -2.69]
#> M1 - M3   |  -2.66 | [ -2.86,  -2.45]
#> M1 - M4   | -30.13 | [-30.34, -29.92]
#> M1 - M5   | -29.76 | [-29.96, -29.55]
#> M2 - M3   |   0.24 | [  0.04,   0.45]
#> M2 - M4   | -27.23 | [-27.43, -27.02]
#> M2 - M5   | -26.86 | [-27.06, -26.65]
#> M3 - M4   | -27.47 | [-27.68, -27.26]
#> M3 - M5   | -27.10 | [-27.31, -26.89]
#> M4 - M5   |   0.37 | [  0.16,   0.57]

Thus, the posterior probability of that range is effectively 0:

# right tail probability
insight::get_parameters(pairs_bayes) |> 
  sapply(\(x) {
    lsp <- logspline::logspline(x) 
    1 - logspline::plogspline(-1, lsp)
  })
#> M1 - M2 M1 - M3 M1 - M4 M1 - M5 M2 - M3 M2 - M4 M2 - M5 M3 - M4 M3 - M5 M4 - M5 
#>       0       0       0       0       1       0       0       0       0       1

Since this is not the case for the prior differences (i.e., $p_{prior} > 0$), which represent the tested hypotheses (which look like this BTW:

library(ggplot2)
library(ggdist)

pairs_bayes_prior |> 
  insight::get_parameters() |> 
  datawizard::data_to_long() |>
  ggplot(aes(value, name)) + 
  # facet_wrap(~name) + 
  stat_slab(aes(fill = after_stat(abs(x) < 1)), n = 2000) + 
  scale_fill_discrete("Hypothesis", labels = c("Alternative", "Null")) + 
  labs(y = "Difference")

image

Does this look the the hypotheses you wish to compare?)

... and since 0 divided by any positive number is $\infty$, you get the BF = Inf (I'm skipping a step in the calculation, but the point is still true).

So, assuming this is the correct null interval for your hypothesis (is it?) and that the extremally wide default priors adequately represent the hypotheses you're trying to test (I very much doubt that), there is no issue at all - you have extreme evidence against the null!


I am not aware of JASP providing such BFs for contrasts.