HC1 Estimates with `vcovDA`

jwasserman2 commented 1 year ago

Upon merging PR #127, we can leverage some of that code to create a .vcov_MB_CR1 estimator, as mentioned in #119. The HC1 adjustment to meat matrix estimates, as given in Mackinnon and White (1985), is $n / (n-k)$. In the presence of clustering, $n$ should be the number of clusters and $k$ either dimension of the covariance matrix.

The HC1 adjustment may not be universally applicable for subgroup-specific treatment effects, however. If subgroup attributes are invariant within clusters, then some clusters may not contribute to certain subgroup-specific treatment effects. In that case, it will be necessary to have subgroup-specific HC1 adjustments. The code in PR #127 may be adapted specifically for that use

benthestatistician commented 1 year ago

Couple potentially relevant papers: neymanScott48, NS problem.pdf The Lancaster review in particular struck me as likely to offer helpful connections and framing for a potential Neyman-Scott adjustments to the HC1 estimators. lancaster00, neyman scott problem review.pdf

jwasserman2 commented 1 year ago

After reading Mackinnon and White (1985), Hinkley (1977)--the source Mackinnon and White (1985) cite for the original HC1 idea, and Section 6 of Cameron and Miller (2015), I find myself more and more intrigued by the idea of subgroup-specific degrees of freedom estimates of the form $n{s}-2$, where $n{s}$ represents the number of independent contributors to the estimate[^1] and the 2nd degree of freedom is lost estimating the subgroup mean.

Hinkley (1977) proposes $n-k$ as the correction because it reflects the degrees of freedom of the residual vector in a linear model with iid errors. There isn't any pooling across subgroup main effect estimates or subgroup-specific intervention effect estimates, so surely more appropriate inference would consider the vector of residuals of units of observation in subgroup $s$[^2] to have $n{s} - 2$ rather than $n - 2k$ degrees of freedom, where $n=\sum{s}n_{s}$ and $k$ represents the number of subgroups.

I've already run some simulation studies pertaining to other inference issues whose results I've posted in our offline doc @benthestatistician, but I intend to run some simulation studies addressing this issue and posting further results there for us to discuss.

[^1]: I use this phrase to distinguish between the cases where there is and isn't clustering: if there's no clustering, units of observations are "independent" of one another and thus independent contributors to the estimate. If there is clustering, then units of observation within clusters are not independent of one another, but under the assumption that they're independent across clusters, clusters represent independent contributors to the estimate. [^2]: Here, I do mean units of observation and not independent contributors to the estimate.

benbhansen-stats / propertee

HC1 Estimates with `vcovDA` #129