bcallaway11 / did

Difference in Differences with Multiple Periods, website: https://bcallaway11.github.io/did
288 stars 92 forks source link

Fewer time periods than groups #89

Open bcallaway11 opened 2 years ago

bcallaway11 commented 2 years ago

This generalizes issue reported in #56. An example of this would be something like bi-annual data (as in NLSY) but where treatment status is observed more frequently.

library(did)
time.periods <- 6
sp <- reset.sim(time.periods=time.periods)
sp$te <- 0
sp$te.e <- 1:time.periods
data <- build_sim_dataset(sp)
data <- subset(data, !(period %in% c(2,5)))

res_dr <- att_gt(yname="Y", xformla=~X, data=data, tname="period", idname="id",
                 gname="G", est_method="dr")
# computing ATT(g,t) works fine
res_dr
#> 
#> Call:
#> att_gt(yname = "Y", tname = "period", idname = "id", gname = "G", 
#>     xformla = ~X, data = data, est_method = "dr")
#> 
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
#> 
#> Group-Time Average Treatment Effects:
#>  Group Time ATT(g,t) Std. Error [95% Simult.  Conf. Band]  
#>      2    3   2.0777     0.0747        1.8587      2.2968 *
#>      2    4   2.9353     0.0693        2.7321      3.1385 *
#>      2    6   4.9018     0.0732        4.6872      5.1165 *
#>      3    3   0.9992     0.0774        0.7722      1.2262 *
#>      3    4   2.0257     0.0738        1.8091      2.2423 *
#>      3    6   3.9543     0.0803        3.7188      4.1898 *
#>      4    3   0.1223     0.0767       -0.1026      0.3473  
#>      4    4   0.8950     0.0793        0.6624      1.1276 *
#>      4    6   2.8671     0.0828        2.6244      3.1098 *
#>      5    3   0.0952     0.0778       -0.1329      0.3233  
#>      5    4  -0.0904     0.0737       -0.3066      0.1258  
#>      5    6   1.9551     0.0775        1.7279      2.1823 *
#>      6    3   0.0942     0.0786       -0.1365      0.3248  
#>      6    4  -0.0906     0.0826       -0.3329      0.1518  
#>      6    6   0.9923     0.0754        0.7713      1.2132 *
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#> 
#> P-value for pre-test of parallel trends assumption:  0.68298
#> Control Group:  Never Treated,  Anticipation Periods:  0
#> Estimation Method:  Doubly Robust

# dynamic aggregation seems to work fine too
dyn_agg <- aggte(res_dr, type="dynamic")
dyn_agg
#> 
#> Call:
#> aggte(MP = res_dr, type = "dynamic")
#> 
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
#> 
#> 
#> Overall summary of ATT's based on event-study/dynamic aggregation:  
#>     ATT    Std. Error     [ 95%  Conf. Int.]  
#>  2.9484        0.0423     2.8654      3.0313 *
#> 
#> 
#> Dynamic Effects:
#>  Event time Estimate Std. Error [95% Simult.  Conf. Band]  
#>          -3   0.0942     0.0792       -0.1144      0.3027  
#>          -2   0.0023     0.0482       -0.1247      0.1293  
#>          -1   0.0133     0.0456       -0.1068      0.1334  
#>           0   0.9640     0.0384        0.8628      1.0651 *
#>           1   2.0196     0.0443        1.9029      2.1364 *
#>           2   2.9021     0.0528        2.7630      3.0412 *
#>           3   3.9543     0.0743        3.7587      4.1500 *
#>           4   4.9018     0.0787        4.6945      5.1092 *
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#> 
#> Control Group:  Never Treated,  Anticipation Periods:  0
#> Estimation Method:  Doubly Robust

# group aggregation fails for groups that are exactly equal to
#missing time periods -- that seems like a bug!
group_agg <- aggte(res_dr, type="group")
group_agg
#> 
#> Call:
#> aggte(MP = res_dr, type = "group")
#> 
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
#> 
#> 
#> Overall summary of ATT's based on group/cohort aggregation:  
#>  ATT    Std. Error     [ 95%  Conf. Int.] 
#>   NA            NA         NA          NA 
#> 
#> 
#> Group Effects:
#>  Group Estimate Std. Error [95% Simult.  Conf. Band]  
#>      2       NA         NA            NA          NA  
#>      3   2.3264     0.0548        2.1942      2.4586 *
#>      4   1.8811     0.0733        1.7045      2.0577 *
#>      5       NA         NA            NA          NA  
#>      6   0.9923     0.0798        0.7999      1.1846 *
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#> 
#> Control Group:  Never Treated,  Anticipation Periods:  0
#> Estimation Method:  Doubly Robust

# calendar aggregation does not compute at all in this case
# low priority to fix this
cal_agg <- aggte(res_dr, type="calendar")
#> Error in rowSums(sapply(keepers, function(k) {: 'x' must be an array of at least two dimensions

A workaround for this (at least what I think is standard operating procedure) is to just recode group to be among those where there is available data. In particular, using the same data as before, if we just recode groups 2 and 5 to be in available time periods, things "work" again. Notice this does introduce some bias; that said, I am not sure if this is an issue worth fixing.

data$G <- dplyr::recode(data$G, "2"=3, "5"=6)

res_dr <- att_gt(yname="Y", xformla=~X, data=data, tname="period", idname="id",
                 gname="G", est_method="dr")
# computing ATT(g,t) works fine
res_dr
#> 
#> Call:
#> att_gt(yname = "Y", tname = "period", idname = "id", gname = "G", 
#>     xformla = ~X, data = data, est_method = "dr")
#> 
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
#> 
#> Group-Time Average Treatment Effects:
#>  Group Time ATT(g,t) Std. Error [95% Simult.  Conf. Band]  
#>      3    3   1.5172     0.0635        1.3478      1.6867 *
#>      3    4   2.5611     0.0664        2.3841      2.7382 *
#>      3    6   4.6435     0.0667        4.4654      4.8216 *
#>      4    3   0.0982     0.0745       -0.1006      0.2970  
#>      4    4   0.9504     0.0726        0.7566      1.1441 *
#>      4    6   3.0124     0.0746        2.8133      3.2115 *
#>      6    3   0.0221     0.0674       -0.1577      0.2020  
#>      6    4  -0.0232     0.0667       -0.2011      0.1547  
#>      6    6   1.6069     0.0647        1.4342      1.7795 *
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#> 
#> P-value for pre-test of parallel trends assumption:  0.57549
#> Control Group:  Never Treated,  Anticipation Periods:  0
#> Estimation Method:  Doubly Robust

# dynamic aggregation runs
dyn_agg <- aggte(res_dr, type="dynamic")
dyn_agg
#> 
#> Call:
#> aggte(MP = res_dr, type = "dynamic")
#> 
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
#> 
#> 
#> Overall summary of ATT's based on event-study/dynamic aggregation:  
#>    ATT    Std. Error     [ 95%  Conf. Int.]  
#>  2.915        0.0474      2.822       3.008 *
#> 
#> 
#> Dynamic Effects:
#>  Event time Estimate Std. Error [95% Simult.  Conf. Band]  
#>          -3   0.0221     0.0674       -0.1532      0.1975  
#>          -2  -0.0232     0.0666       -0.1965      0.1501  
#>          -1   0.0982     0.0769       -0.1019      0.2982  
#>           0   1.4430     0.0358        1.3498      1.5363 *
#>           1   2.5611     0.0640        2.3945      2.7277 *
#>           2   3.0124     0.0758        2.8152      3.2097 *
#>           3   4.6435     0.0716        4.4572      4.8299 *
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#> 
#> Control Group:  Never Treated,  Anticipation Periods:  0
#> Estimation Method:  Doubly Robust

# group aggregation runs
group_agg <- aggte(res_dr, type="group")
group_agg
#> 
#> Call:
#> aggte(MP = res_dr, type = "group")
#> 
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
#> 
#> 
#> Overall summary of ATT's based on group/cohort aggregation:  
#>     ATT    Std. Error     [ 95%  Conf. Int.]  
#>  2.1962         0.038     2.1216      2.2708 *
#> 
#> 
#> Group Effects:
#>  Group Estimate Std. Error [95% Simult.  Conf. Band]  
#>      3   2.9073     0.0594        2.7718      3.0428 *
#>      4   1.9814     0.0646        1.8339      2.1289 *
#>      6   1.6069     0.0658        1.4568      1.7570 *
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#> 
#> Control Group:  Never Treated,  Anticipation Periods:  0
#> Estimation Method:  Doubly Robust

# calendar aggregation works now
cal_agg <- aggte(res_dr, type="calendar")