bcallaway11 / did

Difference in Differences with Multiple Periods, website: https://bcallaway11.github.io/did
288 stars 92 forks source link

Clustered standard errors #123

Closed catabia closed 2 years ago

catabia commented 2 years ago

Don't think there is necessarily a bug, but I need clarification on an issue with the clustervar argument in att_gt. Docs say that you may enter a vector of up to two variables for clustering, as long as one of them is equivalent to idname. However, when I try to cluster on both state and idname (which is actually a column in my dataset that is also input as idname for the function--sorry for the confusing variable names), I get the following error:

Screenshot from 2022-04-07 13-12-15

I don't get an error when the only variable I choose is state. Is this because idname is already automatically included in the cluster?

bcallaway11 commented 2 years ago

Yes, you are right, I think the issue is with the documentation. att_gt will automatically cluster by the "id" variable.

Brant

catabia commented 2 years ago

Hi Brant! Thanks so much for getting back to me quickly! I really appreciate how easy your package makes computing staggered DiDs. So to be absolutely sure that I understand this correctly, when I set clustervars='state', I am clustering on both idname and state? Not state instead of idname?

pedrohcgs commented 2 years ago

Hi Hannah, Since state “nests” idname, in this case you are effectively clustering at the state level.

Thanks

On Thu, Apr 7, 2022 at 12:20 Hannah Catabia @.***> wrote:

Hi Brant! Thanks so much for getting back to me quickly! I really appreciate how easy your package makes computing staggered DiDs. So to be absolutely sure that I understand this correctly, when I set clustervars='state', I am clustering on both idname and state?

— Reply to this email directly, view it on GitHub https://github.com/bcallaway11/did/issues/123#issuecomment-1092115126, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABE734563ZEFK6RZTYCKVPLVD4YQPANCNFSM5S2CO4FA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

--

Pedro H. C. Sant'Anna Department of Economics Vanderbilt University 615-875-8448 (phone) @.*** https://pedrohcgs.github.io

catabia commented 2 years ago

Thank you!

catabia commented 2 years ago

Another quick question. Say that the idname and the other chosen cluster variable are not nested. For instance, say I set clustervars=year. Since they are not nested, how is the cluster-robust variance matrix calculated? Is it this method:

Screenshot from 2022-04-11 11-40-00

That is to stay, adding the variance matrix for idname clusters to the variance matrix for year clusters, then subtracting the variance matrix for idname interacted with year?

bcallaway11 commented 2 years ago

The way we calculate clustered standard errors (both in nested and non-nested cases) are by using the multiplier bootstrap and making the "same" draw of the weights for units in the same clusters. I am not sure if this is roughly analogous to the formula that you sent.

@pedrohcgs, any comments?