Closed jtorcasso closed 2 years ago
Yes, I think weightsname
accomplishes what you are looking for here. Let me just show you what happens in an extreme example with some simulated data. I think this answers your question, but if I am misunderstanding anything, just let me know!
Brant
library(did)
# this code is hard to understand...but it generates
# data where, in post treatment periods,
# the average treatment effect is equal to 1 for group 2,
# 3 for group 3, and 5 for group 4; and does not vary
# over time/length of exposure to the treatment
time.periods <- 4
sp <- reset.sim(time.period=time.periods)
sp$te.bet.ind <- c(0,2,3,4)
sp$te <- -1
data <- build_sim_dataset(sp)
# no weights
res <- att_gt(yname="Y",
tname="period",
idname="id",
gname="G",
xformla=~X,
data=data)
res
#>
#> Call:
#> att_gt(yname = "Y", tname = "period", idname = "id", gname = "G",
#> xformla = ~X, data = data)
#>
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
#>
#> Group-Time Average Treatment Effects:
#> Group Time ATT(g,t) Std. Error [95% Simult. Conf. Band]
#> 2 2 1.0392 0.0918 0.7914 1.2869 *
#> 2 3 1.0210 0.0880 0.7834 1.2587 *
#> 2 4 0.9572 0.0916 0.7099 1.2044 *
#> 3 2 0.0712 0.0672 -0.1102 0.2526
#> 3 3 3.0722 0.1325 2.7147 3.4298 *
#> 3 4 3.1789 0.1272 2.8357 3.5222 *
#> 4 2 0.0287 0.0712 -0.1634 0.2209
#> 4 3 0.0256 0.0713 -0.1667 0.2179
#> 4 4 4.9024 0.1681 4.4486 5.3563 *
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#>
#> P-value for pre-test of parallel trends assumption: 0.63659
#> Control Group: Never Treated, Anticipation Periods: 0
#> Estimation Method: Doubly Robust
# now add weights, for simplicity, they
# are constant within-group, but
# put vast majority of weight on group 4
data$w <- ifelse(data$G==4, .999, .001)
data$w <- data$w / mean(data$w)
# include weights
res2 <- att_gt(yname="Y",
tname="period",
idname="id",
gname="G",
xformla=~X,
data=data,
weightsname="w")
res2 # no effect on ATT(g,t) estimates
#>
#> Call:
#> att_gt(yname = "Y", tname = "period", idname = "id", gname = "G",
#> xformla = ~X, data = data, weightsname = "w")
#>
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
#>
#> Group-Time Average Treatment Effects:
#> Group Time ATT(g,t) Std. Error [95% Simult. Conf. Band]
#> 2 2 1.0392 0.0841 0.8103 1.2681 *
#> 2 3 1.0210 0.0836 0.7935 1.2486 *
#> 2 4 0.9572 0.0877 0.7184 1.1959 *
#> 3 2 0.0712 0.0653 -0.1066 0.2490
#> 3 3 3.0722 0.1324 2.7117 3.4328 *
#> 3 4 3.1789 0.1292 2.8272 3.5306 *
#> 4 2 0.0287 0.0671 -0.1539 0.2112
#> 4 3 0.0257 0.0741 -0.1759 0.2274
#> 4 4 4.9017 0.1803 4.4110 5.3924 *
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#>
#> P-value for pre-test of parallel trends assumption: 0.63646
#> Control Group: Never Treated, Anticipation Periods: 0
#> Estimation Method: Doubly Robust
# but now group 4 dominates "group" aggregation
aggte(res2, type="group")
#>
#> Call:
#> aggte(MP = res2, type = "group")
#>
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
#>
#>
#> Overall summary of ATT's based on group/cohort aggregation:
#> ATT Std. Error [ 95% Conf. Int.]
#> 4.8963 0.1712 4.5608 5.2317 *
#>
#>
#> Group Effects:
#> Group Estimate Std. Error [95% Simult. Conf. Band]
#> 2 1.0058 0.0774 0.8127 1.1989 *
#> 3 3.1256 0.1178 2.8319 3.4193 *
#> 4 4.9017 0.1881 4.4326 5.3708 *
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#>
#> Control Group: Never Treated, Anticipation Periods: 0
#> Estimation Method: Doubly Robust
# also dominates "dynamic" aggregation in first period
# though note that "overall" ATT in this case still
# just evenly averages across all post-treatment periods
aggte(res2, type="dynamic")
#>
#> Call:
#> aggte(MP = res2, type = "dynamic")
#>
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
#>
#>
#> Overall summary of ATT's based on event-study/dynamic aggregation:
#> ATT Std. Error [ 95% Conf. Int.]
#> 2.6504 0.0747 2.504 2.7968 *
#>
#>
#> Dynamic Effects:
#> Event time Estimate Std. Error [95% Simult. Conf. Band]
#> -2 0.0287 0.0708 -0.1476 0.2049
#> -1 0.0258 0.0692 -0.1467 0.1982
#> 0 4.8962 0.1667 4.4811 5.3114 *
#> 1 2.0978 0.0794 1.9001 2.2955 *
#> 2 0.9572 0.0811 0.7551 1.1592 *
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#>
#> Control Group: Never Treated, Anticipation Periods: 0
#> Estimation Method: Doubly Robust
Great example. Looks like by setting weightsname
I can get my desired effect. Thanks!
:+1: , glad this worked!
Is it possible to aggregate across ATT(g,t) by weighting with something other than the number of treated units? For instance, if I had panel data on zip codes, could I weight by total population in the group (the total population of treated zips in the group)? Is this something that supplying
weightsname
accomplishes already? Or doesweightsname
only affect how the individual ATT(g,t) are estimated?