bcallaway11 / did

Difference in Differences with Multiple Periods, website: https://bcallaway11.github.io/did
287 stars 91 forks source link

Discrepancy between DiD 2.0.0 and DiD 2.1.2 #161

Closed gsaabogado closed 1 year ago

gsaabogado commented 1 year ago

Hello, Brantly and Pedro

I am having a bit of an inconsistency in the estimates and standard errors of the 2.00 and 2.1.2 versions of the package.

We want to understand these slight differences because we are in the R&R of a study and need a reason for the slight variation in coefficients between versions.

The codes we use for both versions are exactly the same (If necessary, we can also provide our data set):

For the 2.0.0 version:

#### Clear the space ####
rm(list = ls());gc()

#### Load packages ####
#devtools::install_version("did", version = "2.0.0", repos = "http://cran.us.r-project.org")
library(did)
library(tidyverse)
library(data.table)

#### Load the data set ####
pol = read_rds("02_generated_data/PolYearlyReg.rds")

#### Run the CS-DD algorithm ####
estimate = att_gt(yname = "pm10", gname = "treat_cohort", idname = "station",
       tname = "Period", xformla = ~1, data = pol$Raw, 
       allow_unbalanced_panel = T, control_group = "nevertreated",
       clustervars = "mun", est_method = "dr")

### Aggregate the point estimates ####
summary(did::aggte(estimate, type = "simple", na.rm = T))

This leads to the following result:

Call:
did::aggte(MP = x, type = "simple", na.rm = T)

Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Forthcoming at the Journal of Econometrics <https://arxiv.org/abs/1803.09015>, 2020. 

Overall ATT:  
     ATT Std. Error     [95%  Conf. Int.]  
 -1.7768     0.4739   -2.7056     -0.8481 *

---
Signif. codes: `*' confidence band does not cover 0

Control Group:  Never Treated,  Anticipation Periods:  0
Estimation Method:  Doubly Robust

With the 2.1.2 version we run:

#### Clear the space ####
rm(list = ls());gc()

#### Load packages ####
library(did)
library(tidyverse)
library(data.table)

#### Load the data set ####
pol = read_rds("02_generated_data/PolYearlyReg.rds")

pol = pol$Raw
#### Run the CS-DD algorithm ####
estimate = att_gt(yname = "pm10", gname = "treat_cohort", idname = "station",
                  tname = "Period", xformla = ~1, data = pol$Raw, 
                  allow_unbalanced_panel = T, control_group = "nevertreated",
                  clustervars = "mun", est_method = "dr")

### Aggregate the point estimates ####
aggte(estimate, type = "simple", na.rm = T)

And get the following result

Call:
aggte(MP = estimate, type = "simple", na.rm = T)

Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 

     ATT    Std. Error     [ 95%  Conf. Int.]  
 -1.8027         0.308    -2.4064      -1.199 *

---
Signif. codes: `*' confidence band does not cover 0

Control Group:  Never Treated,  Anticipation Periods:  0
Estimation Method:  Doubly Robust

Can you help us with the reasoning behind the slight adjustment in the DiD algorithm?

bcallaway11 commented 1 year ago

Hi Luis, I think there is a good chance that it is the same issue as discussed in #124. Will you take a look there and let me know if that answers your question?

gsaabogado commented 1 year ago

Hi Brantly,

Yes, issue #124 clarifies our problem.

Thank you for taking the time to answer.

Best