benbhansen-stats / propertee

Prognostic Regression Offsets with Propagation of ERrors, for Treatment Effect Estimation (IES R305D210029).
https://benbhansen-stats.github.io/propertee/
Other
2 stars 0 forks source link

`cluster` argument for `.get_b12()` and `.get_b11()` part 2 #95

Closed jwasserman2 closed 1 year ago

jwasserman2 commented 1 year ago

This PR enhances handling NA's in unit of assignment columns. We follow the principle of clustering at the most granular level possible. Our canonical example focuses on data collected at the student level. Suppose treatment has been assigned at the school/classroom level, but rows corresponding to students in schools in the auxiliary sample only have school ID's; the classroom ID's are unavailable and have thus been coded by the user as NA. Furthermore, suppose the user is fitting their covariance adjustment model to the auxiliary sample only. In this case, clustering would happen at the school level for .get_b11(), but there would be no overlap found in .get_b12() since it would be assumed that rows in the covariance model dataset with the same school ID would also have classroom ID's that would allow for exact matches to the quasiexperimental sample. In the case where a user fits their covariance adjustment model to the union of the quasiexperimental and auxiliary samples, rows sharing the same school/classroom ID's would be clustered together, and rows sharing the same school ID but with NA classroom ID's would be clustered together.