DeclareDesign / estimatr

estimatr: Fast Estimators for Design-Based Inference
https://declaredesign.org/r/estimatr
Other
131 stars 20 forks source link

clustering above assignment level #278

Open macartan opened 5 years ago

macartan commented 5 years ago

Treatment needs to be nested within clusters for differences_in_means but not for lm_robust

data <- fabricate(12, Z = rep(0:1, 6), X = rep(0:1, each = 6), Y = rnorm(12))

lm_robust(Y ~ Z, cluster = X, data = data)
difference_in_means(Y ~ Z, cluster = X, data = data)

Should we not have the same behavior for both?

> lm_robust(Y ~ Z, cluster = X, data = data)
              Estimate Std. Error    t value  Pr(>|t|)  CI Lower CI Upper DF
(Intercept) -0.5717880  0.6304287 -0.9069829 0.5310279 -8.582144 7.438568  1
Z            0.5346832  0.4728588  1.1307461 0.4609849 -5.473557 6.542923  1
> difference_in_means(Y ~ Z, cluster = X, data = data)
Error in difference_in_means_internal(condition1 = condition1, condition2 = condition2,  : 
  All units within a cluster must have the same treatment condition.
lukesonnet commented 5 years ago

I'm happy to remove the constraint in difference_in_means() because it is just kicking to lm_robust() in the clustered case anyways.

I don't believe I have access to it today, but the simplification of the CR2 estimator for the case with a single binary predictor and equal sized clusters (e.g. the commonly-used se(SATE) estimator that is presented in GG on p.83), requires that treatment be unique within cluster.

If we removed this constraint, we should still match the GG estimator in the case with equal sized clusters and unique treatment within cluster. We would simply be allowing another case that the standard clustered DiM estimator could not accommodate.

pmeiners commented 3 years ago

The behavior of the difference_in_means command is still like this. I think this is confusing especially since the documentation states that lm_robust and difference_in_means are the same in the clustered case.