easystats / bayestestR

:ghost: Utilities for analyzing Bayesian models and posterior distributions
https://easystats.github.io/bayestestR/
GNU General Public License v3.0
576 stars 55 forks source link

Request for emmeans methods not to concatenate grouping variables into single column #661

Closed qdread closed 2 months ago

qdread commented 3 months ago

I think it is amazing that this package has methods for emmeans objects. One thing that would make it way more convenient and performant for me would be if the output of functions like describe_posterior() did not concatenate all the grouping variable columns into a single Parameter column. As it is now, I have to either write code to parse the Parameter column back out into separate columns, or pull the separate grouping variable columns from the emmeans object and cbind them back to the bayestestR output. I often need the columns separate for producing plots and tables. The workarounds I have come up with are okay but not easy to implement "at scale" if I have to fit many models on different datasets, with different grouping variables. So I'd like to request that the emmeans methods for functions like ci(), p_pointnull(), etc., retain the grouping variable columns from their input. Thanks for all your work on this package which has truly been a game changer for me!

Example:

library(brms)
library(emmeans)
library(bayestestR)

myfit <- brm(mpg ~ factor(gear) + factor(cyl), data = mtcars)
myemms <- emmeans(myfit, ~ gear + cyl)
mypost <- describe_posterior(myemms)

# Not ideal because the gear and cyl columns get squashed together, without indication of which is which
as.data.frame(mypost)

# My workaround
cbind(
  as.data.frame(myemms)[, c('gear', 'cyl')],
  mypost
)
mattansb commented 3 months ago

I tend to agree. We would need a function to get the grid info and then merge that with the results.

Here is a general solution (minus the formatting):

library(brms)
library(emmeans)
library(bayestestR)

myfit <- brm(mpg ~ factor(gear) + factor(cyl), data = mtcars)
myemms <- emmeans(myfit, pairwise ~ gear | cyl)

# general function to pull grid info
.get_emmeans_grid <- function(object) {
  s <- as.data.frame(myemms)
  s[,1:(which(colnames(s) == attr(s, "estName"))-1)]  
}

describe_posterior.emmGrid <- function(posterior, ...) {
  .grid <- .get_emmeans_grid(posterior)
  results <- bayestestR:::describe_posterior.emmGrid(posterior, ...)
  cbind(.grid, results[,-1])
}

describe_posterior.emmGrid(myemms)
#>    cyl gear      contrast     Median   CI    CI_low   CI_high      pd ROPE_CI ROPE_low ROPE_high ROPE_Percentage
#> 1    4    3             . 25.5062620 0.95 21.507828 29.310093 1.00000    0.95     -0.1       0.1      0.00000000
#> 2    4    4             . 26.7326123 0.95 24.450108 28.932544 1.00000    0.95     -0.1       0.1      0.00000000
#> 3    4    5             . 26.9537215 0.95 23.430046 30.718033 1.00000    0.95     -0.1       0.1      0.00000000
#> 4    6    3             . 18.8184019 0.95 15.042371 22.331300 1.00000    0.95     -0.1       0.1      0.00000000
#> 5    6    4             . 20.0190237 0.95 16.981356 23.146698 1.00000    0.95     -0.1       0.1      0.00000000
#> 6    6    5             . 20.2889764 0.95 16.429401 24.261156 1.00000    0.95     -0.1       0.1      0.00000000
#> 7    8    3             . 14.8589153 0.95 12.953114 16.780434 1.00000    0.95     -0.1       0.1      0.00000000
#> 8    8    4             . 16.0788715 0.95 11.963037 20.400845 1.00000    0.95     -0.1       0.1      0.00000000
#> 9    8    5             . 16.3595230 0.95 12.577941 20.074604 1.00000    0.95     -0.1       0.1      0.00000000
#> 10   4    . gear3 - gear4 -1.1957204 0.95 -5.254543  2.767766 0.74175    0.95     -0.1       0.1      0.03578947
#> 11   4    . gear3 - gear5 -1.5443451 0.95 -5.284180  2.427025 0.79000    0.95     -0.1       0.1      0.03447368
#> 12   4    . gear4 - gear5 -0.2152978 0.95 -4.316286  3.552688 0.54375    0.95     -0.1       0.1      0.04210526
#> 13   6    . gear3 - gear4 -1.1957204 0.95 -5.254543  2.767766 0.74175    0.95     -0.1       0.1      0.03578947
#> 14   6    . gear3 - gear5 -1.5443451 0.95 -5.284180  2.427025 0.79000    0.95     -0.1       0.1      0.03447368
#> 15   6    . gear4 - gear5 -0.2152978 0.95 -4.316286  3.552688 0.54375    0.95     -0.1       0.1      0.04210526
#> 16   8    . gear3 - gear4 -1.1957204 0.95 -5.254543  2.767766 0.74175    0.95     -0.1       0.1      0.03578947
#> 17   8    . gear3 - gear5 -1.5443451 0.95 -5.284180  2.427025 0.79000    0.95     -0.1       0.1      0.03447368
#> 18   8    . gear4 - gear5 -0.2152978 0.95 -4.316286  3.552688 0.54375    0.95     -0.1       0.1      0.04210526
strengejacke commented 2 months ago

When we extract draws from Bayesian objects processed with emmeans, we use emmeans::as.mcmc.emmGrid() in insight::get_parameters(), which creates these column names. I agree we should have an argument that adds the names as separate column.

mattansb commented 2 months ago

See new output style in #672