bcallaway11 / did

Difference in Differences with Multiple Periods, website: https://bcallaway11.github.io/did
288 stars 92 forks source link

[Bug?] Impacts in pre-period on outcome that is 0 before event. #126

Closed jtorcasso closed 2 years ago

jtorcasso commented 2 years ago

Not sure if this is a bug. But sometimes when I compute estimates on a variable with no variation (in this case it is 0 prior to the event) I still get non-zero effects estimated in the pre-period. Shouldn't these estimates be exactly 0? I can try to produce mwe with data I can share, but in the meantime, any thoughts?

test = read.csv('test_data.csv')
max(test[test$t < test$event_date_int,]$sd_gv)
0
min(test[test$t < test$event_date_int,]$sd_gv)
0
dim(test[test$event_date_int==0,])
0   138
did.est <- att_gt(
    yname = 'sd_gv',
    tname = "t",
    idname = "zip",
    gname = "event_date_int",
    xformla = ~1,
    data = test,
    est_method = 'reg', # only used if covariates are included
    control_group = 'notyettreated',
    clustervars = c('zip'),
    weightsname='ops_2018',
    bstrap=T,
    cband=T,
    base_period="universal",
    allow_unbalanced_panel=F
)
aggte(did.est, type = "dynamic", min_e=-12, max_e=12)
Call:
aggte(MP = did.est, type = "dynamic", min_e = -12, max_e = 12)

Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 

Overall summary of ATT's based on event-study/dynamic aggregation:  
    ATT    Std. Error     [ 95%  Conf. Int.]  
 0.0296         0.001     0.0277      0.0314 *

Dynamic Effects:
 Event time Estimate Std. Error [95% Simult.  Conf. Band]  
        -12   0.0076     0.0005        0.0062      0.0091 *
        -11   0.0076     0.0006        0.0061      0.0091 *
        -10   0.0076     0.0005        0.0062      0.0090 *
         -9   0.0076     0.0005        0.0062      0.0091 *
         -8   0.0076     0.0005        0.0062      0.0091 *
         -7   0.0076     0.0005        0.0062      0.0091 *
         -6   0.0076     0.0005        0.0062      0.0090 *
         -5   0.0075     0.0005        0.0061      0.0089 *
         -4   0.0067     0.0006        0.0052      0.0082 *
         -3   0.0058     0.0005        0.0044      0.0072 *
         -2   0.0040     0.0005        0.0026      0.0055 *
         -1   0.0000         NA            NA          NA  
          0   0.0154     0.0015        0.0115      0.0193 *
          1   0.0262     0.0010        0.0235      0.0289 *
          2   0.0252     0.0008        0.0229      0.0274 *
          3   0.0286     0.0010        0.0259      0.0314 *
          4   0.0317     0.0011        0.0289      0.0345 *
          5   0.0338     0.0010        0.0310      0.0365 *
          6   0.0337     0.0011        0.0308      0.0366 *
          7   0.0344     0.0013        0.0308      0.0379 *
          8   0.0297     0.0013        0.0262      0.0332 *
          9   0.0268     0.0014        0.0231      0.0304 *
         10   0.0271     0.0014        0.0234      0.0309 *
         11   0.0408     0.0011        0.0379      0.0437 *
         12   0.0309     0.0012        0.0277      0.0342 *
---
Signif. codes: `*' confidence band does not cover 0

Control Group:  Not Yet Treated,  Anticipation Periods:  0
Estimation Method:  Outcome Regression
bcallaway11 commented 2 years ago

Hi Jake,

Thanks for sending this. I think it was a bug for the case with a universal base period. But I think I have it fixed now if you update to the new version on github. Here is some demo code:

library(did)
sp <- reset.sim(time.periods=10)
data <- build_sim_dataset(sp)
data <- subset(data, G != 0) # drop never treated
data <- subset(data, G > 6)
data <- subset(data, period > 5)
data$Y[(data$period < data$G)] <- 0 # set pre-treatment = 0
res <- att_gt(yname="Y",
              tname="period",
              idname="id",
              gname="G",
              data=data,
              control_group = "notyettreated",
              base_period="universal")
#> No pre-treatment periods to test
res
#> 
#> Call:
#> att_gt(yname = "Y", tname = "period", idname = "id", gname = "G", 
#>     data = data, control_group = "notyettreated", base_period = "universal")
#> 
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
#> 
#> Group-Time Average Treatment Effects:
#>  Group Time ATT(g,t) Std. Error [95% Simult.  Conf. Band]  
#>      7    6   0.0000         NA            NA          NA  
#>      7    7  13.7827     0.3654       12.9246     14.6408 *
#>      7    8  14.9262     0.4144       13.9530     15.8995 *
#>      7    9  15.9908     0.4357       14.9675     17.0142 *
#>      8    6   0.0000         NA            NA          NA  
#>      8    7   0.0000         NA            NA          NA  
#>      8    8  15.0873     0.4157       14.1111     16.0635 *
#>      8    9  16.1022     0.4640       15.0125     17.1919 *
#>      9    6   0.0000         NA            NA          NA  
#>      9    7   0.0000         NA            NA          NA  
#>      9    8   0.0000         NA            NA          NA  
#>      9    9  16.4631     0.4750       15.3474     17.5787 *
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#> 
#> Control Group:  Not Yet Treated,  Anticipation Periods:  0
#> Estimation Method:  Doubly Robust
jtorcasso commented 2 years ago

Awesome. Thanks for looking into this. Will update and follow up if necessary.