easystats / bayestestR

:ghost: Utilities for analyzing Bayesian models and posterior distributions
https://easystats.github.io/bayestestR/
GNU General Public License v3.0
563 stars 55 forks source link

Usability improvements for pd() and pd_to_p() #665

Closed bwiernik closed 1 month ago

bwiernik commented 1 month ago

I've been using pd() a bunch lately, and there are a few things that make it a little annoying to integrate into my workflow. These 2 things would make it much smoother for me:

[ ] A as_vector argument that returns a simple vector of pd values, rather than a data frame.

This would fit into a workflow where I have a data frame with posterior draws (e.g., as a list column or as posterior::rvar) and I want to compute pd as a new column.

(An alternative would be to make the method for rvar always return a vector value, but I don't like that inconsistency, and it would mean it's not applicable to similar data structures that don't use posterior, like list columns.)

results |> transform(pd = pd(.epred))

[ ] A pd_to_p.p_direction() method that takes the data frame output of p_direction() and directly converts it a data frame with p values instead. That would avoid having to do this rather involved set of steps:

results |> pd() |> transform(p = pd_to_p(pd)) |> subset(select = - pd)

[ ] Maybe it's a little too heretical, but maybe add an as_frequentist_p or as_p argument to pd() to return the results scaled as p values in one step

strengejacke commented 1 month ago

For the first point, I think @mattansb added an as.numeric() method for all functions.

mattansb commented 1 month ago

I think that was @DominiqueMakowski ?

@bwiernik How about something like allowing the data.frame method to select an rvar column? (Which I think was what I meant with #604)

grid <- data.frame(
  A = letters[1:2],
  B = c(2, 3),
  val = posterior::rvar(array(rnorm(1200), dim = c(600, 2)))
)

# Pull rvar column:
bayestestR::p_direction(grid$val)
#> Probability of Direction
#> 
#> Parameter |     pd
#> ------------------
#> x[1]      | 53.50%
#> x[2]      | 52.67%

# Or pass the data frame and tell the function what column has rvars:
bayestestR::p_direction(grid, rvar_col = "val")
#>   A B          val        pd
#> 1 a 2 0.062 ± 0.99 0.5350000
#> 2 b 3 0.037 ± 1.00 0.5266667

# Original behavior is maintained when not specifying rvar_col:
bayestestR::p_direction(grid)
#> Probability of Direction
#> 
#> Parameter |   pd
#> ----------------
#> B         | 100%

This is implemented in #666 😈

bwiernik commented 1 month ago

I typically use pd() inside of a call to mutate() with several other transformations (eg, taking an rvar columns and computing its median, CI, and pd in one step)

strengejacke commented 1 month ago

This sounds like something that can be done with model_parameters().

bwiernik commented 1 month ago

No, I'm working with vectors of predictions or custom contrasts

DominiqueMakowski commented 1 month ago

I agree that there's room for improvement, I also find out functions sometimes fiddly within tidyverse pipelines

strengejacke commented 1 month ago

[ ] A pd_to_p.p_direction() method that takes the data frame output of p_direction() and directly converts it a data frame with p values instead. That would avoid having to do this rather involved set of steps:

Should that return a data frame again?

bwiernik commented 3 weeks ago

[ ] A pd_to_p.p_direction() method that takes the data frame output of p_direction() and directly converts it a data frame with p values instead. That would avoid having to do this rather involved set of steps:

Should that return a data frame again?

Yes, I think that's what you implemented?

strengejacke commented 3 weeks ago

yes. and we have as.numeric() or as.vector() methods to returns a vector instead of df.