chirunconf / chirunconf19

Discussion of potential projects for Chicago R Unconference, March 9-10, 2019
16 stars 2 forks source link

Add option to `broom` tidiers to standardize parameters #28

Closed IndrajeetPatil closed 3 years ago

IndrajeetPatil commented 5 years ago

Standardized regression coefficients are much easier to compare, interpret, visualize, etc. than unstandardized ones. So it'll be nice if broom tidiers gain a standardize argument that decides whether the regression coefficients are to be standardized.

Few other packages have tried to do this.

wlandau commented 5 years ago

Maybe rstanarm too?

IndrajeetPatil commented 5 years ago

Apparently, it doesn't have any function that does that (https://discourse.mc-stan.org/t/obtaining-standardized-coefficients-from-rstanarm-package-in-r/3603).

wlandau commented 5 years ago

Could broom itself do the standardization on the posterior samples? Or would that fall outside scope?

tjmahr commented 5 years ago

If we are doing stuff on posterior samples, then tidybayes might be better. I think of that as the Bayesian broom. (Sorry Alex 😅.)

What would we need to standardize the coefficients? Is there some magic we can do with a variance-covariance matrix? Or do we have to dig inside the model's data/matrix, take the SD of the numeric variables and do Gelman's thing to the binary variables?

IndrajeetPatil commented 5 years ago

Haha. broom actually no longer contains anything Bayesian. All relevant tidiers were moved to broom.mixed. That said, I don't really know Bayesian statistics that well and so I don't have much to add on how to standardize regression estimates from these models.

Yes, the latter sounds about right. Need to think more about the implementation.

alexpghayes commented 5 years ago

Re: Bayes stuff: totally agree, tidybayes is the place for this.

I haven't really invested any effort into this in broom so far mostly because the effort-reward ratio feels low, so I've watched the dotwhisker approach from a distance. There are basically two options: standardize the input data, or standardize the final design matrix. See the dotwhisker discussion for commentary -- roughly, it seems like either would be fine.

If you want to standardize the input data, this is pre fit() time, and the appropriate tool is recipes in my opinion. If you want to standardize the predictors, I still think that the appropriate tool is recipes, just with some additional steps.

If you want to standardize coefficients after the fact, that means going to find the terms object in a given model and calling model.frame() / model.matrix() and standardizing those. My experience is that dealing with bizarro edges cases based on idiosyncratic and/or partial support for formulas and model.matrix() is one of the more painful parts of the R universe. Especially when packages allow multiple forms of data specification (i.e. formula and x/y interfaces), the resulting model objects never have all the information you need to recreate the original model preprocessing.

I see the appeal of a standardize argument and am happy to provide guidance if someone wants to take it on (in particular I can point out several gotchas you might stumble onto), but just wanted to express why I've previously been hesitant about this.

alexpghayes commented 5 years ago

Additional thought: you might want to standardize after the fact so you could get both the standardized and original scale coefficients without fitting the model twice. Again, recipes will support this type of thing in the future because it will allow undoing steps, so you could fit on the recipes-standardized data, extract the column scales with tidy(step_scale) or perhaps a more immediate step_undo_scaling() type operation and recover the original scale coefs that way.