DeclareDesign / estimatr

estimatr: Fast Estimators for Design-Based Inference
https://declaredesign.org/r/estimatr
Other
131 stars 20 forks source link

method for `augment` or `fortify`. #377

Open McCartneyAC opened 3 years ago

McCartneyAC commented 3 years ago

Hi guys,

I am trying to use {estimatr} with a lot of downstream packages (such as {lindia} or {report}, or to do added variable plots with my own package) that rely on a model matrix that we would usually get from broom::augment() or ggplot2::fortify().

I know this is in your heads because I found this issue for broom::fortify():

https://github.com/tidymodels/broom/issues/237

I'm just including it here as an issue because I took a first stab at trying to do it myself and got very stuck, very quickly.

nfultz commented 3 years ago

Using the lm method for augment almost works, but fails:

augment.lm_robust <- broom:::augment.lm
m <- lm_robust(extra~group, sleep)

augment(m) fails:

Error in eval(predvars, data, env) : object 'group' not found

Pop into the debugger, see the line it fails on:

df$.fitted <- predict(x, na.action = na.pass, ...) %>% unname()

This is because lm_robust doesn't retain the data set, so you need to provide newdata explicitly:

augment(m, newdata=sleep)

A tibble: 20 x 5

extra group ID .fitted .resid

1 0.7 1 1 0.750 -0.0500 2 -1.6 1 2 0.750 -2.35 3 -0.2 1 3 0.750 -0.950 4 -1.2 1 4 0.750 -1.95 5 -0.1 1 5 0.750 -0.850 6 3.4 1 6 0.750 2.65 7 3.7 1 7 0.750 2.95 8 0.8 1 8 0.750 0.05 9 0 1 9 0.750 -0.750 10 2 1 10 0.750 1.25 11 1.9 2 1 2.33 -0.43 12 0.8 2 2 2.33 -1.53 13 1.1 2 3 2.33 -1.23 14 0.1 2 4 2.33 -2.23 15 -0.1 2 5 2.33 -2.43 16 4.4 2 6 2.33 2.07 17 5.5 2 7 2.33 3.17 18 1.6 2 8 2.33 -0.73 19 4.6 2 9 2.33 2.27 20 3.4 2 10 2.33 1.07

On Mon, Feb 8, 2021 at 6:26 PM Andrew McCartney notifications@github.com wrote:

Hi guys,

I am trying to use {estimatr} with a lot of downstream packages (such as {lindia} or {report}, or to do added variable plots with my own package) that rely on a model matrix that we would usually get from broom::augment() or ggplot2::fortify().

I know this is in your heads because I found this issue for broom::fortify():

tidymodels/broom#237 https://github.com/tidymodels/broom/issues/237

I'm just including it here as an issue because I took a first stab at trying to do it myself and got very stuck, very quickly.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/DeclareDesign/estimatr/issues/377, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADGGTRKMCYEQC7GA2RX53TS6CMJTANCNFSM4XKEAORA .

McCartneyAC commented 3 years ago

I got this to work! Let me play with it a bit in my use case but this is super helpful.

I guess the next issue is just dealing with the problem of standard errors given the heteroskedasticity issue. Have to make some judgment calls. Thanks.

lukesonnet commented 3 years ago

@nfultz, should we create our own augment method that requires newdata? Is that advisable?

nfultz commented 3 years ago

Since augment.lm isn't exported, I think it would be better to add augment.lm_robust in the broom package instead of here:

augment.lm_robust <- function(x, data, newdata, ...) {
  # force(newdata) or similar check
  # delegate to augment.lm(x, data, newdata, ...)
}
McCartneyAC commented 3 years ago

Correct me if I'm wrong, but isn't it current practice with {broom} that they're no longer adding new model objects and things like this? or am I thinking of a different ecosystem?

EDIT: From the {broom} main website:

However, for maintainability purposes, the broom package authors now ask that requests for new methods be first directed to the parent package (i.e. the package that supplies the model object) rather than to broom. New methods will generally only be integrated into broom in the case that the requester has already asked the maintainers of the model-owning package to implement tidier methods in the parent package.

nfultz commented 3 years ago

That makes sense. estimatr is a bad case, because the model objects are generally compatible with other methods for lm but not always. It doesn't have lm in it's list of classes because that's a promise of compatibility that it generally can't keep. And since augment.lm is not exported, can't explicilty opt in to using the feature either from the estimatr package namespace.

Something like the below might work, but it's ugly as sin and will probably fail somewhere down the line if/when broom relies on other methods for lm that are similarly incompatible.

augment.lm_robust <- function(x, data, newdata, ...) {
  # force(newdata) or similar check
  class(x) <- "lm"
  augment(x, data, newdata, ...)
}