Closed avila-a closed 2 years ago
Thanks for pointing this out, Antonio. It might take me a few days to follow up on this; our semester is starting up this week, so I’m quite busy at the moment, but I’ll follow up soon.
Hi Antonio,
Thanks, this is a really good catch. I think I have fixed this issue now. I ran it through some of the test code that I have, and appears to work. See one example below, but please let me know if you think the new code fixes your issue.
Thanks, Brant
For simplicity, let's just look at the case with no covariates and use the never-treated comparison group so that we can compute treatment effects easily "by hand". Here are by-hand calculations for a pre-treatment period and a post treatment period
# compare to pre-treatment
mean(subset(mpdta, first.treat==2007 & year==2005)$lemp - subset(mpdta, first.treat==2007 & year==2006)$lemp) -
mean(subset(mpdta, first.treat==0 & year==2005)$lemp - subset(mpdta, first.treat==0 & year==2006)$lemp)
#> [1] 0.03108712
# compare to post-treatment
mean(subset(mpdta, first.treat==2007 & year==2007)$lemp - subset(mpdta, first.treat==2007 & year==2006)$lemp) -
mean(subset(mpdta, first.treat==0 & year==2007)$lemp - subset(mpdta, first.treat==0 & year==2006)$lemp)
#> [1] -0.02605441
Now compare to output from did
(note, that previously, I was getting wrong sign issues in pre-treatment periods as you were reporting above)
devtools::load_all("~/Dropbox/did")
#> ℹ Loading did
library(BMisc)
# before code changes
out_balanced_new <- att_gt(yname = "lemp",
gname = "first.treat",
idname = "countyreal",
tname = "year",
xformla = ~1,
data = mpdta,
base_period="universal",
)
out_balanced_new
#>
#> Call:
#> att_gt(yname = "lemp", tname = "year", idname = "countyreal",
#> gname = "first.treat", xformla = ~1, data = mpdta, base_period = "universal")
#>
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
#>
#> Group-Time Average Treatment Effects:
#> Group Time ATT(g,t) Std. Error [95% Simult. Conf. Band]
#> 2004 2003 0.0000 NA NA NA
#> 2004 2004 -0.0105 0.0254 -0.0770 0.0560
#> 2004 2005 -0.0704 0.0305 -0.1502 0.0093
#> 2004 2006 -0.1373 0.0374 -0.2352 -0.0394 *
#> 2004 2007 -0.1008 0.0348 -0.1920 -0.0096 *
#> 2006 2003 -0.0038 0.0328 -0.0896 0.0821
#> 2006 2004 0.0028 0.0214 -0.0534 0.0589
#> 2006 2005 0.0000 NA NA NA
#> 2006 2006 -0.0046 0.0178 -0.0511 0.0419
#> 2006 2007 -0.0412 0.0194 -0.0920 0.0096
#> 2007 2003 0.0033 0.0249 -0.0618 0.0684
#> 2007 2004 0.0338 0.0213 -0.0219 0.0895
#> 2007 2005 0.0311 0.0183 -0.0169 0.0791
#> 2007 2006 0.0000 NA NA NA
#> 2007 2007 -0.0261 0.0189 -0.0757 0.0236
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#>
#> P-value for pre-test of parallel trends assumption: 0.16812
#> Control Group: Never Treated, Anticipation Periods: 0
#> Estimation Method: Doubly Robust
out_unbalanced_new <- att_gt(yname = "lemp",
gname = "first.treat",
idname = "countyreal",
tname = "year",
xformla = ~1,
data = mpdta,
base_period="universal",
allow_unbalanced_panel=TRUE
)
out_unbalanced_new
#>
#> Call:
#> att_gt(yname = "lemp", tname = "year", idname = "countyreal",
#> gname = "first.treat", xformla = ~1, data = mpdta, allow_unbalanced_panel = TRUE,
#> base_period = "universal")
#>
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
#>
#> Group-Time Average Treatment Effects:
#> Group Time ATT(g,t) Std. Error [95% Simult. Conf. Band]
#> 2004 2003 0.0000 NA NA NA
#> 2004 2004 -0.0105 0.0237 -0.0756 0.0545
#> 2004 2005 -0.0704 0.0333 -0.1619 0.0210
#> 2004 2006 -0.1373 0.0383 -0.2425 -0.0320 *
#> 2004 2007 -0.1008 0.0372 -0.2031 0.0014
#> 2006 2003 -0.0038 0.0317 -0.0908 0.0833
#> 2006 2004 0.0028 0.0191 -0.0498 0.0553
#> 2006 2005 0.0000 NA NA NA
#> 2006 2006 -0.0046 0.0186 -0.0558 0.0466
#> 2006 2007 -0.0412 0.0208 -0.0984 0.0159
#> 2007 2003 0.0033 0.0248 -0.0647 0.0713
#> 2007 2004 0.0338 0.0209 -0.0235 0.0911
#> 2007 2005 0.0311 0.0176 -0.0173 0.0795
#> 2007 2006 0.0000 NA NA NA
#> 2007 2007 -0.0261 0.0170 -0.0727 0.0206
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#>
#> P-value for pre-test of parallel trends assumption: 0.16812
#> Control Group: Never Treated, Anticipation Periods: 0
#> Estimation Method: Doubly Robust
They look right now in both cases.
Good afternoon,
I just wanted to report a strange behaviour in the
did
package. Seems like the reported pre-treatment period parameters for the "universal" base period seem to be "inverted" whenallow_unbalanced_panel = FALSE
but look consistent whenallow_unbalanced_panel = TRUE
.I show below a practical (not easy reproducible without the data) example of that happening:
1. The data:
Panel data of ~30 years of GDP per capita data for 113 European regions. Only one region has the final 5 years as missing data at the end. This dataset is shared for all commands
2. The code and various results
(One treated unit with missing data)
did
codeAnd its plots:
I suspect the results with the
allow_unbalanced_panel = TRUE
are correct while the ones withallow_unbalanced_panel = FALSE
have some sort of coding error that inverts pre-treatment estimates.Finally, many thanks for developing this package and hope this is useful to perfect it.
Best,
Antonio Avila