Open bwiernik opened 2 years ago
Do you see us being able to "extract" info from objects produced by these two packages (probably easy), or do you imagine us writing "wrappers" around those two packages (more maintenance)?
For both, they take lists of models and return results back, so wrapping isn't likely to be much effort. We already wrap around loo for a much of functions in performance, so we can probably create some unified code for easystats generally
For Bayes, I wrote this function that does this exactly a while back (years???)
https://easystats.github.io/bayestestR/reference/weighted_posteriors.html
Currently used BFs for weights (personally, that's what makes the most sense to me as we are mixing the posteriors from the same data, but different priors), but should be expandable (:
There's a Vehtari, Gelman, et al paper discussing that the "stacking" method used in {loo} by default apparently has some important advantages over traditional Bayesian model averaging
what would be the model-average estimate? Here's an example for the intervals, I think it's easy to re-implement mata()
.
library(insight)
library(MATA)
set.seed(0)
n <- 20 # 'n' is assumed to be even
x1 <- c(rep(0, n / 2), rep(1, n / 2)) # two groups: x1=0, and x1=1
x2 <- rnorm(n, mean = 10, sd = 3)
y <- rnorm(n, mean = 3 * x1 + 0.1 * x2) # data generation
x1 <- factor(x1)
m1 <- glm(y ~ x1) # using 'glm' provides AIC values.
m2 <- glm(y ~ x1 + x2) # using 'lm' doesn't.
aic <- c(m1$aic, m2$aic)
delta.aic <- aic - min(aic)
model.weights <- exp(-0.5 * delta.aic) / sum(exp(-0.5 * delta.aic))
# see also
# performance::compare_performance(m1, m2)
residual.dfs <- c(insight::get_df(m1), insight::get_df(m2))
g1 <- insight::get_datagrid(m1)
g2 <- insight::get_datagrid(m2)
nd1 <- as.data.frame(lapply(g1, function(i) {
if (is.factor(i)) {
as.factor(levels(i)[1])
} else {
unique(i)[1]
}
}))
nd2 <- as.data.frame(lapply(g2, function(i) {
if (is.factor(i)) {
as.factor(levels(i)[1])
} else {
unique(i)[1]
}
}))
p1 <- get_predicted(m1, data = nd1, ci = .95)
p2 <- get_predicted(m2, data = nd2, , ci = .95)
theta.hats <- c(p1, p2)
se.theta.hats <- c(attributes(p1)$ci_data$SE, attributes(p2)$ci_data$SE)
# 95% MATA-Wald confidence interval for theta:
mata.wald(theta.hats, se.theta.hats, model.weights,
mata.t = TRUE,
residual.dfs = residual.dfs
)
#> [1] -0.3756852 1.5499646
Created on 2022-09-05 with reprex v2.0.2
I think it should be theta.hats
, right?
I have used {glmulti} for multimodel inference. I believe it implements frequentist model average by default (and by design???). It explores through millions of possible model formula specifications and retains the top ones of an arbitrary size, optionally using a genetic algorithm, which I have not seen provided in other packages. It also provides model-averaged model summary statistics including coef, SEs, CIs. MuMIn Package was mentioned along with {glmulti} in https://www.metafor-project.org/doku.php/tips:model_selection_with_glmulti_and_mumin and https://www.r-bloggers.com/2013/02/model-selection-and-multi-model-inference, but I have not tried it.
Do you see us being able to "extract" info from objects produced by these two packages (probably easy), or do you imagine us writing "wrappers" around those two packages (more maintenance)?
@vincentarelbundock Would be nice to have model-averaged marginal effects. It appears very few have done so, except this one https://www.ajordannafa.com/blog/2022/being-less-wrong where {marginaleffects} was useful. However, I am not sure how to implement model-averaged marginal effects if using frequentist and not Bayesian models. I assume it is to extract coefficient estimates and vcov() from model-averaged summary statistics, and supply it just like a single model to {marginaleffects}. Otherwise, many single models need to be supplied, and it sounds lots of predictions to do for {marginaleffects}.
There's now a convenient package for computing frequentist model-averaged CIs/CDs https://cran.r-project.org/web/packages/MATA/index.html
As well as for Bayesian model-averaging/stacking https://mc-stan.org/loo/reference/loo_model_weights.html
We should have a function to compute average estimates/intervals/densities. It should permit choice of weighting method, interval method, and be passable to {see} for plotting.
For frequentist averaging, that's usually done based on information criteria, so this would entail a soft dependency of {parameters} on {performance}.