Closed IndrajeetPatil closed 3 years ago
Apparently, it doesn't have any function that does that (https://discourse.mc-stan.org/t/obtaining-standardized-coefficients-from-rstanarm-package-in-r/3603).
Could broom itself do the standardization on the posterior samples? Or would that fall outside scope?
If we are doing stuff on posterior samples, then tidybayes might be better. I think of that as the Bayesian broom. (Sorry Alex 😅.)
What would we need to standardize the coefficients? Is there some magic we can do with a variance-covariance matrix? Or do we have to dig inside the model's data/matrix, take the SD of the numeric variables and do Gelman's thing to the binary variables?
Haha. broom
actually no longer contains anything Bayesian. All relevant tidiers were moved to broom.mixed
. That said, I don't really know Bayesian statistics that well and so I don't have much to add on how to standardize regression estimates from these models.
Yes, the latter sounds about right. Need to think more about the implementation.
Re: Bayes stuff: totally agree, tidybayes
is the place for this.
I haven't really invested any effort into this in broom
so far mostly because the effort-reward ratio feels low, so I've watched the dotwhisker
approach from a distance. There are basically two options: standardize the input data, or standardize the final design matrix. See the dotwhisker
discussion for commentary -- roughly, it seems like either would be fine.
If you want to standardize the input data, this is pre fit()
time, and the appropriate tool is recipes
in my opinion. If you want to standardize the predictors, I still think that the appropriate tool is recipes
, just with some additional steps.
If you want to standardize coefficients after the fact, that means going to find the terms
object in a given model and calling model.frame()
/ model.matrix()
and standardizing those. My experience is that dealing with bizarro edges cases based on idiosyncratic and/or partial support for formulas and model.matrix()
is one of the more painful parts of the R universe. Especially when packages allow multiple forms of data specification (i.e. formula and x/y interfaces), the resulting model objects never have all the information you need to recreate the original model preprocessing.
I see the appeal of a standardize
argument and am happy to provide guidance if someone wants to take it on (in particular I can point out several gotchas you might stumble onto), but just wanted to express why I've previously been hesitant about this.
Additional thought: you might want to standardize after the fact so you could get both the standardized and original scale coefficients without fitting the model twice. Again, recipes
will support this type of thing in the future because it will allow undoing steps, so you could fit on the recipes
-standardized data, extract the column scales with tidy(step_scale)
or perhaps a more immediate step_undo_scaling()
type operation and recover the original scale coefs that way.
Standardized regression coefficients are much easier to compare, interpret, visualize, etc. than unstandardized ones. So it'll be nice if
broom
tidiers gain astandardize
argument that decides whether the regression coefficients are to be standardized.Few other packages have tried to do this.
dotwhisker
package uses Gelman's 2SD method: https://www.rdocumentation.org/packages/dotwhisker/versions/0.5.0/topics/by_2sd (but it falters when interaction terms are present: https://github.com/fsolt/dotwhisker/issues/82)arm
package also uses Gelman's method but supports very few models: https://www.rdocumentation.org/packages/arm/versions/1.10-1/topics/standardizeparameters
package (unclear method) https://easystats.github.io/parameters/reference/standardize.lm.html