bcallaway11 / did

Difference in Differences with Multiple Periods, website: https://bcallaway11.github.io/did
298 stars 95 forks source link

Aggregate across ATT(g,t) with other weights? #94

Closed jtorcasso closed 2 years ago

jtorcasso commented 2 years ago

Is it possible to aggregate across ATT(g,t) by weighting with something other than the number of treated units? For instance, if I had panel data on zip codes, could I weight by total population in the group (the total population of treated zips in the group)? Is this something that supplying weightsname accomplishes already? Or does weightsname only affect how the individual ATT(g,t) are estimated?

bcallaway11 commented 2 years ago

Yes, I think weightsname accomplishes what you are looking for here. Let me just show you what happens in an extreme example with some simulated data. I think this answers your question, but if I am misunderstanding anything, just let me know!

Brant

library(did)
# this code is hard to understand...but it generates
# data where, in post treatment periods, 
# the average treatment effect is equal to 1 for group 2,
# 3 for group 3, and 5 for group 4; and does not vary 
# over time/length of exposure to the treatment
time.periods <- 4
sp <- reset.sim(time.period=time.periods)
sp$te.bet.ind <- c(0,2,3,4)
sp$te <- -1
data <- build_sim_dataset(sp)

# no weights
res <- att_gt(yname="Y",
              tname="period",
              idname="id",
              gname="G",
              xformla=~X,
              data=data)
res
#> 
#> Call:
#> att_gt(yname = "Y", tname = "period", idname = "id", gname = "G", 
#>     xformla = ~X, data = data)
#> 
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
#> 
#> Group-Time Average Treatment Effects:
#>  Group Time ATT(g,t) Std. Error [95% Simult.  Conf. Band]  
#>      2    2   1.0392     0.0918        0.7914      1.2869 *
#>      2    3   1.0210     0.0880        0.7834      1.2587 *
#>      2    4   0.9572     0.0916        0.7099      1.2044 *
#>      3    2   0.0712     0.0672       -0.1102      0.2526  
#>      3    3   3.0722     0.1325        2.7147      3.4298 *
#>      3    4   3.1789     0.1272        2.8357      3.5222 *
#>      4    2   0.0287     0.0712       -0.1634      0.2209  
#>      4    3   0.0256     0.0713       -0.1667      0.2179  
#>      4    4   4.9024     0.1681        4.4486      5.3563 *
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#> 
#> P-value for pre-test of parallel trends assumption:  0.63659
#> Control Group:  Never Treated,  Anticipation Periods:  0
#> Estimation Method:  Doubly Robust

# now add weights, for simplicity, they 
# are constant within-group, but 
# put vast majority of weight on group 4
data$w <- ifelse(data$G==4, .999, .001)
data$w <- data$w / mean(data$w)

# include weights
res2 <- att_gt(yname="Y",
               tname="period",
               idname="id",
               gname="G",
               xformla=~X,
               data=data,
               weightsname="w")

res2 # no effect on ATT(g,t) estimates
#> 
#> Call:
#> att_gt(yname = "Y", tname = "period", idname = "id", gname = "G", 
#>     xformla = ~X, data = data, weightsname = "w")
#> 
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
#> 
#> Group-Time Average Treatment Effects:
#>  Group Time ATT(g,t) Std. Error [95% Simult.  Conf. Band]  
#>      2    2   1.0392     0.0841        0.8103      1.2681 *
#>      2    3   1.0210     0.0836        0.7935      1.2486 *
#>      2    4   0.9572     0.0877        0.7184      1.1959 *
#>      3    2   0.0712     0.0653       -0.1066      0.2490  
#>      3    3   3.0722     0.1324        2.7117      3.4328 *
#>      3    4   3.1789     0.1292        2.8272      3.5306 *
#>      4    2   0.0287     0.0671       -0.1539      0.2112  
#>      4    3   0.0257     0.0741       -0.1759      0.2274  
#>      4    4   4.9017     0.1803        4.4110      5.3924 *
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#> 
#> P-value for pre-test of parallel trends assumption:  0.63646
#> Control Group:  Never Treated,  Anticipation Periods:  0
#> Estimation Method:  Doubly Robust

# but now group 4 dominates "group" aggregation
aggte(res2, type="group")
#> 
#> Call:
#> aggte(MP = res2, type = "group")
#> 
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
#> 
#> 
#> Overall summary of ATT's based on group/cohort aggregation:  
#>     ATT    Std. Error     [ 95%  Conf. Int.]  
#>  4.8963        0.1712     4.5608      5.2317 *
#> 
#> 
#> Group Effects:
#>  Group Estimate Std. Error [95% Simult.  Conf. Band]  
#>      2   1.0058     0.0774        0.8127      1.1989 *
#>      3   3.1256     0.1178        2.8319      3.4193 *
#>      4   4.9017     0.1881        4.4326      5.3708 *
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#> 
#> Control Group:  Never Treated,  Anticipation Periods:  0
#> Estimation Method:  Doubly Robust

# also dominates "dynamic" aggregation in first period
# though note that "overall" ATT in this case still 
# just evenly averages across all post-treatment periods
aggte(res2, type="dynamic")
#> 
#> Call:
#> aggte(MP = res2, type = "dynamic")
#> 
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
#> 
#> 
#> Overall summary of ATT's based on event-study/dynamic aggregation:  
#>     ATT    Std. Error     [ 95%  Conf. Int.]  
#>  2.6504        0.0747      2.504      2.7968 *
#> 
#> 
#> Dynamic Effects:
#>  Event time Estimate Std. Error [95% Simult.  Conf. Band]  
#>          -2   0.0287     0.0708       -0.1476      0.2049  
#>          -1   0.0258     0.0692       -0.1467      0.1982  
#>           0   4.8962     0.1667        4.4811      5.3114 *
#>           1   2.0978     0.0794        1.9001      2.2955 *
#>           2   0.9572     0.0811        0.7551      1.1592 *
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#> 
#> Control Group:  Never Treated,  Anticipation Periods:  0
#> Estimation Method:  Doubly Robust
jtorcasso commented 2 years ago

Great example. Looks like by setting weightsname I can get my desired effect. Thanks!

bcallaway11 commented 2 years ago

:+1: , glad this worked!