easystats / parameters

:bar_chart: Computation and processing of models' parameters
https://easystats.github.io/parameters/
GNU General Public License v3.0
440 stars 36 forks source link

Model-averaged estimates/intervals/distributions #768

Open bwiernik opened 2 years ago

bwiernik commented 2 years ago

There's now a convenient package for computing frequentist model-averaged CIs/CDs https://cran.r-project.org/web/packages/MATA/index.html

As well as for Bayesian model-averaging/stacking https://mc-stan.org/loo/reference/loo_model_weights.html

We should have a function to compute average estimates/intervals/densities. It should permit choice of weighting method, interval method, and be passable to {see} for plotting.

For frequentist averaging, that's usually done based on information criteria, so this would entail a soft dependency of {parameters} on {performance}.

vincentarelbundock commented 2 years ago

Do you see us being able to "extract" info from objects produced by these two packages (probably easy), or do you imagine us writing "wrappers" around those two packages (more maintenance)?

bwiernik commented 2 years ago

For both, they take lists of models and return results back, so wrapping isn't likely to be much effort. We already wrap around loo for a much of functions in performance, so we can probably create some unified code for easystats generally

mattansb commented 2 years ago

For Bayes, I wrote this function that does this exactly a while back (years???)

https://easystats.github.io/bayestestR/reference/weighted_posteriors.html

Currently used BFs for weights (personally, that's what makes the most sense to me as we are mixing the posteriors from the same data, but different priors), but should be expandable (:

bwiernik commented 2 years ago

There's a Vehtari, Gelman, et al paper discussing that the "stacking" method used in {loo} by default apparently has some important advantages over traditional Bayesian model averaging

strengejacke commented 2 years ago

what would be the model-average estimate? Here's an example for the intervals, I think it's easy to re-implement mata().

library(insight)
library(MATA)

set.seed(0)
n <- 20 # 'n' is assumed to be even
x1 <- c(rep(0, n / 2), rep(1, n / 2)) # two groups: x1=0, and x1=1
x2 <- rnorm(n, mean = 10, sd = 3)
y <- rnorm(n, mean = 3 * x1 + 0.1 * x2) # data generation

x1 <- factor(x1)
m1 <- glm(y ~ x1) # using 'glm' provides AIC values.
m2 <- glm(y ~ x1 + x2) # using 'lm' doesn't.
aic <- c(m1$aic, m2$aic)
delta.aic <- aic - min(aic)
model.weights <- exp(-0.5 * delta.aic) / sum(exp(-0.5 * delta.aic))

# see also
# performance::compare_performance(m1, m2)

residual.dfs <- c(insight::get_df(m1), insight::get_df(m2))

g1 <- insight::get_datagrid(m1)
g2 <- insight::get_datagrid(m2)

nd1 <- as.data.frame(lapply(g1, function(i) {
  if (is.factor(i)) {
    as.factor(levels(i)[1])
  } else {
    unique(i)[1]
  }
}))

nd2 <- as.data.frame(lapply(g2, function(i) {
  if (is.factor(i)) {
    as.factor(levels(i)[1])
  } else {
    unique(i)[1]
  }
}))

p1 <- get_predicted(m1, data = nd1, ci = .95)
p2 <- get_predicted(m2, data = nd2, , ci = .95)

theta.hats <- c(p1, p2)
se.theta.hats <- c(attributes(p1)$ci_data$SE, attributes(p2)$ci_data$SE)

#  95% MATA-Wald confidence interval for theta:
mata.wald(theta.hats, se.theta.hats, model.weights,
  mata.t = TRUE,
  residual.dfs = residual.dfs
)
#> [1] -0.3756852  1.5499646

Created on 2022-09-05 with reprex v2.0.2

strengejacke commented 2 years ago

I think it should be theta.hats, right?

DrJerryTAO commented 1 year ago

I have used {glmulti} for multimodel inference. I believe it implements frequentist model average by default (and by design???). It explores through millions of possible model formula specifications and retains the top ones of an arbitrary size, optionally using a genetic algorithm, which I have not seen provided in other packages. It also provides model-averaged model summary statistics including coef, SEs, CIs. MuMIn Package was mentioned along with {glmulti} in https://www.metafor-project.org/doku.php/tips:model_selection_with_glmulti_and_mumin and https://www.r-bloggers.com/2013/02/model-selection-and-multi-model-inference, but I have not tried it.

Do you see us being able to "extract" info from objects produced by these two packages (probably easy), or do you imagine us writing "wrappers" around those two packages (more maintenance)?

@vincentarelbundock Would be nice to have model-averaged marginal effects. It appears very few have done so, except this one https://www.ajordannafa.com/blog/2022/being-less-wrong where {marginaleffects} was useful. However, I am not sure how to implement model-averaged marginal effects if using frequentist and not Bayesian models. I assume it is to extract coefficient estimates and vcov() from model-averaged summary statistics, and supply it just like a single model to {marginaleffects}. Otherwise, many single models need to be supplied, and it sounds lots of predictions to do for {marginaleffects}.