DeclareDesign / estimatr

estimatr: Fast Estimators for Design-Based Inference
https://declaredesign.org/r/estimatr
Other
131 stars 20 forks source link

Explore possibility of interactions in lin estimator #131

Open acoppock opened 6 years ago

acoppock commented 6 years ago

There's a neat way of analyzing a 2*2 factorial design:

Demean both treatment indicators, then run

y ~ Z1_c + Z2_c + Z1_c*Z2_c

the coef on Z1_c is the ATE of factor 1, the coef on Z2_c is the ATE of factor 2, and the coef on the interaction has the usual interpretation.

Macartan notes the interesting connection with the lin estimator here and wonder if its possible to use our machinery to accomplish this. I'm a little worried about unintended consequences, but just noting this idea here so it doesn't get lost!

nfultz commented 6 years ago

You can accomplish the same thing using sum-to-zero contrasts I think.

lukesonnet commented 6 years ago

I think if you even make it a 3x2, you lose the general interpretability of the interaction terms, so I'm not sure building it into lin (by taking the full set of interactions of all treatments passed to the function) make sense. But I can definitely explore this.

macartan commented 6 years ago

The way I think of it in a factorial, each arm is a covariate for another treatment; if lin makes sense for a covariate it makes sense for a treatment. The q here is if you want to look at interactions between treatments, can we make it easy to center them (not the prior question of whether you want to look at interactions). If in the 23 the second factor has an ordered interpretation ($0, $50, $100) one might be interested in interactions, and so interested in centering; if it is multinomial then one might be interested in splitting T2 and creating new vars to allow T1(T2==1) + T1*(T2==2); in that case one might want to center the new vars. Put another way, there is no reason why the Lin logic would break down just because a covariate was randomly assigned, is there?

On Tue, Feb 13, 2018 at 4:55 PM, Luke Sonnet notifications@github.com wrote:

I think if you even make it a 3x2, you lose the general interpretability of the interaction terms, so I'm not sure building it into lin (by taking the full set of interactions of all treatments passed to the function) make sense. But I can definitely explore this.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/DeclareDesign/estimatr/issues/131#issuecomment-365458800, or mute the thread https://github.com/notifications/unsubscribe-auth/AMJO_ZAHnWTSOGoSwPfRXAXUTDckBciwks5tUi8QgaJpZM4SERKO .

lukesonnet commented 6 years ago

No, in this case nothing need be done differently other than passing the second treatment variable as a covariate. However note we will not currently center whatever treatment you put in the first formula.

macartan commented 6 years ago

Yes -- that is the issue. One might want to have Y ~ T1T2 as the first formula and indicate that T1 and T2 should both be centered; currently though first formula allows only one treatment term; Of course one can do one at a time, but am wondering whether we should think of a syntax in which first formula contains full specification and second lists items to be centered; eg rather than Y ~ Z, ~X, have Y~ZX, list(X) ; in that case one could as easily have Y~Z1*Z2, list(Z1, Z2) ; meaning Z1, Z2 should be recentered and the formula applied using data with centered vars.

On Tue, Feb 13, 2018 at 6:04 PM, Luke Sonnet notifications@github.com wrote:

No, in this case nothing need be done differently other than passing the second treatment variable as a covariate. However note we will not currently center whatever treatment you put in the first formula.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/DeclareDesign/estimatr/issues/131#issuecomment-365471706, or mute the thread https://github.com/notifications/unsubscribe-auth/AMJO_TdhOq2mLMYMW9rtTy_aF2FIbd6Jks5tUj80gaJpZM4SERKO .

macartan commented 6 years ago

Yes -- that is the issue. One might want to have Y ~ T1T2 as the first formula and indicate that T1 and T2 should both be centered; currently though first formula allows only one treatment term; Now of course one can do one treatment at a time, but am wondering whether we should think of a syntax in which first formula contains the full specification and second lists items to be centered; eg rather than Y ~ Z, ~X, have Y~ZX, list(X), meaning center X and then use the formula ; in that case one could as easily have Y~Z1*Z2, list(Z1, Z2) ; meaning Z1, Z2 should be recentered and the formula applied using data with centered vars.

On Tue, Feb 13, 2018 at 6:04 PM, Luke Sonnet notifications@github.com wrote:

No, in this case nothing need be done differently other than passing the second treatment variable as a covariate. However note we will not currently center whatever treatment you put in the first formula.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/DeclareDesign/estimatr/issues/131#issuecomment-365471706, or mute the thread https://github.com/notifications/unsubscribe-auth/AMJO_TdhOq2mLMYMW9rtTy_aF2FIbd6Jks5tUj80gaJpZM4SERKO .

lukesonnet commented 6 years ago

I'd really like the first formula to only have treatment variables, as part of what's nice is we will center and interact everything in the second formula and save users that hassle in complicated cases with many covariates. In that case it's a simple tweak the syntax to something like this:

Y ~ Z1*Z2, ~Z1+Z2+X

Where it would builds the full set of the treatments on the left (Z1 + Z2 + Z1*Z2) and then treats variables in the second formula as the "covariates" to center and interact with each of the three treatments. This would keep the syntax we have for simple cases and allow more complex specifications.

It also maintains this clarity of the second formula being your specification for covariate adjustment within each treatment, which I like.

We could also move the second formula to list(Z1, Z2, X), although I don't prefer this to the formula. My main preference is to make it easy to specify many covariates. This also keeps lm_lin from simply being a centering data pre-processor when it could also build the formula for beginners.

lukesonnet commented 6 years ago

For example, imagine the case with multiple covariates, here's the difference in syntax:

Y~Z1 Z2 X1 + Z1 Z2 X2, list(Z1, Z2, X1, X2)

Vs

Y~Z1*Z2, ~Z1 + Z2 + X1 +X2

Or in a simple case

Y~Z X1 +Z X2, list(X1, X2)

Vs

Y~Z, ~X1+X2

I think the latter is more tightly linked to the idea of what we want to do (estimate (1) treatment effects by specifying a full set of interactions with (2) covariates)) while also being shorter to write.

macartan commented 6 years ago

think you are right on simpler syntax and input; makes sense to me, assuming behavior works out to be the same

On Tue, Feb 13, 2018 at 6:35 PM, Luke Sonnet notifications@github.com wrote:

For example, imagine the case with multiple covariates, here's the difference in syntax:

Y~Z1Z2X1 + Z1Z2X2, list(Z1, Z2, X1, X2)

Vs

Y~Z1*Z2, ~Z1 + Z2 + X1 +X2

Or in a simple case

Y~ZX1 +ZX2, list(X1, X2)

Vs

Y~Z, ~X1+X2

I think the latter is more tightly linked to the idea of what we want to do (estimate (1) treatment effects by specifying a full set of interactions with (2) covariates)) while also being shorter to write.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/DeclareDesign/estimatr/issues/131#issuecomment-365476967, or mute the thread https://github.com/notifications/unsubscribe-auth/AMJO_Z5U-Mcm1JbnO0yAZaBVGoVVbkZPks5tUkZ5gaJpZM4SERKO .

lukesonnet commented 6 years ago

@acoppock, could you chime in here re: our discussion about this changing the lin estimator?

graemeblair commented 6 years ago

I wanted though to unstick on our discussion by pointing out that this isn't just about whether there is a change to lm_lin, but if we want to support for example factorial designs with two treatment vars in a lin-like estimator we can either have that in lm_lin or if that is confusing we can just make a separate function.

lukesonnet commented 5 years ago

Similarly, Jake Bowers requests lm_lin for multi-valued treatments.