BCa bootstrap confidence intervals

bwiernik commented 3 years ago

For models that support them, it would be nice to include BCa intervals in addition to the currently supported percentile intervals. For completeness, we might also want Studentized or Basic bootstrap, but I see less value in those.

strengejacke commented 3 years ago

library(boot)
library(parameters)

model <- lm(Sepal.Length ~ Species * Petal.Width, data = iris)

set.seed(2)
bootstrap_parameters(model)
#> # Fixed Effects
#> 
#> Parameter                     | Coefficient |        95% CI |      p
#> --------------------------------------------------------------------
#> (Intercept)                   |        4.78 | [ 4.50, 5.00] | < .001
#> Speciesversicolor             |       -0.72 | [-1.62, 0.08] | 0.094 
#> Speciesvirginica              |        0.50 | [-0.67, 1.65] | 0.415 
#> Petal.Width                   |        0.91 | [ 0.22, 1.97] | 0.039 
#> Speciesversicolor:Petal.Width |        0.50 | [-0.66, 1.52] | 0.393 
#> Speciesvirginica:Petal.Width  |       -0.27 | [-1.36, 0.67] | 0.545

set.seed(2)
bootstrap_parameters(model, ci_method = "bci")
#> # Fixed Effects
#> 
#> Parameter                     | Coefficient |        95% CI |      p
#> --------------------------------------------------------------------
#> (Intercept)                   |        4.78 | [ 4.49, 4.99] | < .001
#> Speciesversicolor             |       -0.72 | [-1.66, 0.06] | 0.094 
#> Speciesvirginica              |        0.50 | [-0.71, 1.61] | 0.415 
#> Petal.Width                   |        0.91 | [ 0.29, 2.07] | 0.039 
#> Speciesversicolor:Petal.Width |        0.50 | [-0.78, 1.46] | 0.393 
#> Speciesvirginica:Petal.Width  |       -0.27 | [-1.39, 0.58] | 0.545

^{Created on 2021-05-27 by the reprex package (v2.0.0)}

IndrajeetPatil commented 3 years ago

😮

Daniel is the Flash of supporting feature requests! ⚡

tenor

strengejacke commented 3 years ago

Not sure how to adjust p-values, though.

mattansb commented 3 years ago

The current implantation is here:

https://github.com/easystats/parameters/blob/main/R/methods_base.R#L143-L157

But this method assumes translation equivariance: that shape of the sampling distribution is unaffected by the null being true or not.... Which is probably never really true anyway? A true p-value would require a permutation test... Otherwise, I would suggest dropping p-values entirely.

bwiernik commented 3 years ago

The approach I would use is to compute a wide range of confidence limits and interpolate at which limit the interval crosses zero. See e.g., https://www.stat.umn.edu/geyer/old/5601/examp/tests.html. This could be combined with optim() to find the precise limit if desired. This approach is consistent with the confidence distribution interpretation of p values.

Edit: example:

library(bootstrap)
set.seed(1234)
x <- mtcars$am
y <- mtcars$carb
print(theta.hat <- cor(x, y))
#> [1] 0.05753435

my.cor <- function(k) cor(x[k], y[k])
bca.out <- bcanon(seq(along = x), nboot = 10000,
                  theta = my.cor, alpha = seq(0.001, 0.999, 0.001))

plot(bca.out$confpoints[ , 1], bca.out$confpoints[ , 2],
     xlab = "alpha", ylab = "bca point")
abline(h = 0)

## lower-tailed test of theta = 0
ltpv <- approx(bca.out$confpoints[ , 2],
               bca.out$confpoints[ , 1], xout = 0)$y
#> Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
#> collapsing to unique 'x' values
ltpv
#> [1] 0.378

## upper-tailed test of theta = 0
1 - ltpv
#> [1] 0.622

## upper-tailed test of theta = 0
2 * min(ltpv, 1 - ltpv)
#> [1] 0.756

^{Created on 2021-05-27 by the reprex package (v2.0.0)}

bwiernik commented 3 years ago

@strengejacke what's your thinking on calling the method "bci" instead of "bcai"?

strengejacke commented 3 years ago

@strengejacke what's your thinking on calling the method "bci" instead of "bcai"?

Just the three letters. ;-) We may also rename it into bcai(). @DominiqueMakowski any preferences here?

mattansb commented 3 years ago

Interesting...

Here's my version (that uses the bootstrapped distribution we already have in the output):

p_value_distrib <- function(x, null = 0){
  get_ci <- function(alpha, d) {
    alpha <- ifelse(alpha > 1, 1, 
                    ifelse(alpha < 0, 0,
                           alpha))
    q <- quantile(c(d, null), probs = c(alpha/2, 1 - alpha/2))
    min(abs(q - null))
  }

  optim(0.5, fn = get_ci, d = x)[["par"]]
}

set.seed(444)
dat <- data.frame(x = rnorm(20),
                  y = rnorm(20))
m <- lm(y ~ x, dat)
b <- parameters::bootstrap_parameters(m)

parameters::model_parameters(m)$p[2]
#> [1] 0.1572755
b$p[2]
#> [1] 0.1308691
attr(b, "boot_samples")[[2]] |> p_value_distrib()
#> [1] 0.17

^{Created on 2021-05-27 by the reprex package (v2.0.0)}

Does this make sense?

How would be support BCa and other methods this way?

strengejacke commented 3 years ago

@strengejacke what's your thinking on calling the method "bci" instead of "bcai"?

you can now use both "bci" and "bcai", and bcai() was added as alias.

easystats / parameters

BCa bootstrap confidence intervals #503