Open DominiqueMakowski opened 2 months ago
I agree other than that I think we should keep LOOIC+SE (rather than ELPD). LOOIC is -2*ELPD, which puts it in the deviance metric that is also used in frequentist models with AIC, etc, and this allows us to show the stacking weights with our existing formatting functions
Fair So we'd go with ELPD_DIFF (SE); and LOOIC (weights) +SE? Or just the latter? is the SE for LOOIC super necessary since we have the weights?
Should we also add some details in the documentation about how to interpret the weights, because I've been asked by students and I wasn't too clear in my answer... any rules of thumb?
Moreover, in https://github.com/easystats/report/pull/419, we compute the z-transformed ELPD diff (i.e., normalized by SE). But I think we might as well directly convert it to a p-value given that its asymptotic distribution, it would be the most easy and "intuitive" to read (rather than the z-value). ~In order to avoid duplicated code, I suggest we add a parameters::parameters()
method for brms::loo_compare() outputs that would simply format and compute the columns we need, we could then use that in report as well as here. I'll open a PR~
EDIT: forget what I said about the duplicated code it wouldn't work
Here is a skeleton, maybe it would make more sense to add it as a method of test_performance()
rather than compare_performance()
(?)
test_performance()
for Bayesian models could be a watered-down version (mostly focusing on the diff) of compare_performance (which would return the IC values + R2 etc. etc.).test_performance_bayesian <- function(objects, criterion="loo") {
# Compute differences ----
if (criterion %in% c("loo", "looic")) {
diff <- brms::loo_compare(lapply(objects, loo::loo))
col <- "looic"
} else if (criterion == "waic") {
diff <- brms::loo_compare(lapply(objects, loo::waic))
col <- "waic"
} else {
stop("Criterion not supported")
}
# Format ----
out <- as.data.frame(diff)
out$Name <- rownames(out)
# Select columns
out <- out[, c("Name", col, "elpd_diff", "se_diff")]
# Compute p-value ----
# ELPD difference is asympatotically normal, hence the z-score is computed
z <- out$elpd_diff[-1] / out$se_diff[-1]
out$p <- c(NA, 2 * pnorm(-abs(z)))
# Return
out
}
m1 <- brms::brm(wt ~ mpg, data=mtcars, refresh=0)
#> Compiling Stan program...
#> Start sampling
m2 <- brms::brm(wt ~ mpg + hp, data=mtcars, refresh=0)
#> Compiling Stan program...
#> Start sampling
m3 <- brms::brm(wt ~ mpg + hp + qsec, data=mtcars, refresh=0)
#> Compiling Stan program...
#> Start sampling
objects <- insight::ellipsis_info(m1, m2, m3, only_models = TRUE)
.test_performance_bayesian(objects)
#> Warning: Found 1 observations with a pareto_k > 0.7 in model 'X[[i]]'. We
#> recommend to set 'moment_match = TRUE' in order to perform moment matching for
#> problematic observations.
#> Warning: Found 1 observations with a pareto_k > 0.7 in model 'X[[i]]'. We
#> recommend to set 'moment_match = TRUE' in order to perform moment matching for
#> problematic observations.
#> Name looic elpd_diff se_diff p
#> m3 m3 45.99388 0.000000 0.000000 NA
#> m1 m1 50.78136 -2.393738 2.318327 0.30182476
#> m2 m2 53.35469 -3.680406 2.208913 0.09568128
Created on 2024-04-23 with reprex v2.0.2
I suggest we do LOOIC (with weights) and SE. If folks want ELPD or WAIC, they can request those instead.
The SE is useful in the same way as for ELPD, whereas the weights are more like a transformed point estimate.
The current output would benefit from some streamlining.
Suggestions:
What do you think?