Open cmjoyce opened 1 year ago
Hi Caroline, Does the error remains when you just use dist_id as cluster variable?
Thanks
On Thu, May 18, 2023 at 15:49 cmjoyce @.***> wrote:
Hi there,
I'm using the did package and need to account for clustering at the district level, which is different from my idname (individuals residing in these clusters). Based on the existing documentation, I've accounted for individual and district level clustering. The code and error message are as follows:
att_gt(yname = "outcome", tname = "year", gname = "g", idname = "id", xformla = ~ 1, data = df, panel = FALSE, weightsname = "weight_adj", clustervars = c("id", "dist_id"), control_group = "notyettreated", print_details = TRUE, bstrap=TRUE, cband=FALSE ) Error in mboot(inffunc, DIDparams = dp, pl = pl, cores = cores) : can't handle that many cluster variables
I've tried making a vector of these variables and using that as my clustervars, but that just errors out.
Is there a way to get around this error and account for both clustering variables?
Thanks very much, Caroline
— Reply to this email directly, view it on GitHub https://github.com/bcallaway11/did/issues/175, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABE7344GREUVLUUC4OMMRVTXG2DMXANCNFSM6AAAAAAYG6EXVM . You are receiving this because you are subscribed to this thread.Message ID: @.***>
--
Pedro H. C. Sant'Anna https://psantanna.com https://psantanna.com
Hi thanks for the quick response!
There's no error when I use just dist_id as my clustering variable, though I end up with some very large (confusingly so) standard errors for some treatment groups-- especially if including individual-level covariates. But if clustering only on district gives correctly calculated standard errors I will assume the issue is on my end.
Caroline
@cmjoyce, sorry for the delayed response. I am surprised that you got an error with the first version that you sent. I am marking that as a bug as I think it should work.
That being said, by default, we already cluster at the unit level (in your case "id"), so clustering on both ends up being redundant. This is not a fix for the large standard errors, but they are the ones that I think you were trying to get from the beginning.
Yes, I think my clustering on two variables was redundant -- I tweaked some things and got it working. I limited my clustering to one variable to avoid the error message. Thanks for the awesome package!
Ok, great!
Note to self: I am going to leave this open as I think this could be confusing for users. Need to think about what behavior should be if user provides includes "id" among the clustering variables.
Hi, I am having a similar issue related to this post. I have balanced panel data where I want to cluster at group and time level. I am using the individual id variable in clustervars instead of the group variable as per the documentation. I have 3 time periods (years), 3,000 observations per period and 1,000 per group which amounts to 9,000 observations in total. Below is my code and error
csdid_out <- att_gt(yname = "Y2it",
tname = "year",
gname = "first.treat",
idname = "id",
est_method = "reg",
data = data,
panel = TRUE,
clustervars = c("id", "year"),
control_group = "notyettreated",
bstrap = TRUE,
cband = FALSE,
)
Error in mboot(inffunc, DIDparams = dp, pl = pl, cores = cores) :
can't handle time-varying cluster variables
I will appreciate any help on this.
Time should not be used as cluster in a DiD procedure with with fixed T.
You cant make inference with 3 observations…
Pedro H. C. Sant'Anna https://psantanna.com https://psantanna.com
On Wed, Apr 3, 2024 at 16:37 kdjiffa @.***> wrote:
Hi, I am having a similar issue related to this post. I have balanced panel data where I want to cluster at group and time level. I am using the individual id variable in clustervars instead of the group variable as per the documentation. I have 3 time periods (years), 3,000 observations per period and 1,000 per group which amounts to 9,000 observations in total. Below is my code and error
csdid_out <- att_gt(yname = "Y2it", tname = "year", gname = "first.treat", idname = "id", est_method = "reg", data = data, panel = TRUE, clustervars = c("id", "year"), control_group = "notyettreated", bstrap = TRUE, cband = FALSE, ) Error in mboot(inffunc, DIDparams = dp, pl = pl, cores = cores) : can't handle time-varying cluster variables
I will appreciate any help on this.
— Reply to this email directly, view it on GitHub https://github.com/bcallaway11/did/issues/175#issuecomment-2035536309, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABE7344G7JILX2DOCCOSHG3Y3RSBNAVCNFSM6AAAAAAYG6EXVOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZVGUZTMMZQHE . You are receiving this because you commented.Message ID: @.***>
Thanks for your quick feedback. In fact, what I meant is group*period (intersection) level clustering. What is the best way to cluster at such level? Thanks
Just use the id.
Thanks
Pedro H. C. Sant'Anna https://psantanna.com https://psantanna.com
On Wed, Apr 3, 2024 at 19:34 kdjiffa @.***> wrote:
Thanks for your quick feedback. In fact, what I meant is group*period (intersection) level clustering. What is the best way to cluster at such level? Thanks
— Reply to this email directly, view it on GitHub https://github.com/bcallaway11/did/issues/175#issuecomment-2035807710, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABE7344DTJ3RXYXW2I5LOVDY3SGXLAVCNFSM6AAAAAAYG6EXVOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZVHAYDONZRGA . You are receiving this because you commented.Message ID: @.***>
Thanks
Hi there,
I'm using the did package and need to account for clustering at the district level, which is different from my idname (individuals residing in these clusters). Based on the existing documentation, I've accounted for individual and district level clustering. The code and error message are as follows:
I've tried making a vector of these variables and using that as my clustervars, but that just errors out.
Is there a way to get around this error and account for both clustering variables?
Thanks very much, Caroline