bcallaway11 / did

Difference in Differences with Multiple Periods, website: https://bcallaway11.github.io/did
287 stars 91 forks source link

Multicollinearity issue? #173

Closed Ales-G closed 9 months ago

Ales-G commented 1 year ago

Dear all, fir of all thank you very much for your amazing work! Not only you relotionized DiD but you also make it simple to implement.

I have a question related to estimation with unbalanced panels.

I have a panel that looks like this. The panel is unbalanced because, roughly speaking, the data generating process requires data to be collected every couple of years.

Screenshot 2023-05-02 at 17 33 14

I am trying to estimate the att. If I use the DR approach the command fails to estimate the model.

est_did = did::att_gt(yname = "auditrating_pass",
+                       gname = "gname",
+                       idname = "prod",
+                       tname = "yr",
+                       xformla = ~rsph+audcmp+auditann+audseq,
+                       est_method="dr",
+                       control_group="notyettreated",
+                       panel=T,
+                       allow_unbalanced_panel = T,
+                       pl=T,
+                       cores = 24,
+                       alp=0.05,
+                       data = data %>% filter(anytrai==1))
Error in DRDID::drdid_rc(y = Y, post = post, D = G, covariates = covariates,  : 
  Outcome regression model coefficients have NA components. 
 Multicollinearity (or lack of variation) of covariates is a likely reason.
In addition: Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred 

Conversely if I use IPW the command estimate the model but does not calculate SEs.

 est_did = did::att_gt(yname = "auditrating_pass",
+                       gname = "gname",
+                       idname = "prod",
+                       tname = "yr",
+                       xformla = ~rsph+audcmp+auditann+audseq,
+                       est_method="ipw",
+                       control_group="notyettreated",
+                       panel=T,
+                       allow_unbalanced_panel = T,
+                       pl=T,
+                       cores = 24,
+                       alp=0.05,
+                       data = data %>% filter(anytrai==1))
agg_ipw_d <- did::aggte(est_did, type = "dynamic", na.rm=T)
> agg_ipw_d

Call:
did::aggte(MP = est_did, type = "dynamic", na.rm = T)

Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 

Overall summary of ATT's based on event-study/dynamic aggregation:  
    ATT    Std. Error     [ 95%  Conf. Int.] 
 0.1816            NA         NA          NA 

Dynamic Effects:
 Event time Estimate Std. Error [95% Simult.  Conf. Band]  
         -4   0.2158     0.0957       -0.0139      0.4456  
         -3  -0.1269     0.0522       -0.2522     -0.0016 *
         -2  -0.0011     0.0438       -0.1063      0.1042  
         -1  -0.0438         NA            NA          NA  
          0   0.0843         NA            NA          NA  
          1   0.1218     0.0772       -0.0635      0.3071  
          2   0.1859         NA            NA          NA  
          3  -0.0160         NA            NA          NA  
          4   0.5317         NA            NA          NA  
---
Signif. codes: `*' confidence band does not cover 0

Control Group:  Not Yet Treated,  Anticipation Periods:  0
Estimation Method:  Inverse Probability Weighting

Do you know what may be causing the problem and do you have any suggestion on how I could overcome this issue?

Surprisingl I am able to estimate the model using Sun and Abraham (2020) approach, but I would like to include your estimation approach as well

Thanks a lot

pedrohcgs commented 1 year ago

Can you run the model without covariates?

Also, how do you run SA with covariates? That is not covered in their paper, so the implementation is arguably very different from ours.

Thanks

On Tue, May 2, 2023 at 11:40 Ales-G @.***> wrote:

Dear all, fir of all thank you very much for your amazing work! Not only you relotionized DiD but you also make it simple to implement.

I have a question related to estimation with unbalanced panels.

I have a panel that looks like this. The panel is unbalanced because, roughly speaking, the data generating process requires data to be collected every couple of years.

[image: Screenshot 2023-05-02 at 17 33 14] https://user-images.githubusercontent.com/59447419/235728425-be7bee66-3f3d-4e5a-9ff1-5ed6d06b4792.png

I am trying to estimate the att. If I use the DR approach the command fails to estimate the model.

est_did = did::att_gt(yname = "auditrating_pass",

  • gname = "gname",
  • idname = "prod",
  • tname = "yr",
  • xformla = ~rsph+audcmp+auditann+audseq,
  • est_method="dr",
  • control_group="notyettreated",
  • panel=T,
  • allow_unbalanced_panel = T,
  • pl=T,
  • cores = 24,
  • alp=0.05,
  • data = data %>% filter(anytrai==1)) Error in DRDID::drdid_rc(y = Y, post = post, D = G, covariates = covariates, : Outcome regression model coefficients have NA components. Multicollinearity (or lack of variation) of covariates is a likely reason. In addition: Warning message: glm.fit: fitted probabilities numerically 0 or 1 occurred

Conversely if I use IPW the command estimate the model but does not calculate SEs.

est_did = did::att_gt(yname = "auditrating_pass",

  • gname = "gname",
  • idname = "prod",
  • tname = "yr",
  • xformla = ~rsph+audcmp+auditann+audseq,
  • est_method="ipw",
  • control_group="notyettreated",
  • panel=T,
  • allow_unbalanced_panel = T,
  • pl=T,
  • cores = 24,
  • alp=0.05,
  • data = data %>% filter(anytrai==1)) agg_ipw_d <- did::aggte(est_did, type = "dynamic", na.rm=T)

    agg_ipw_d

Call: did::aggte(MP = est_did, type = "dynamic", na.rm = T)

Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. https://doi.org/10.1016/j.jeconom.2020.12.001, https://arxiv.org/abs/1803.09015

Overall summary of ATT's based on event-study/dynamic aggregation: ATT Std. Error [ 95% Conf. Int.] 0.1816 NA NA NA

Dynamic Effects: Event time Estimate Std. Error [95% Simult. Conf. Band] -4 0.2158 0.0957 -0.0139 0.4456 -3 -0.1269 0.0522 -0.2522 -0.0016 * -2 -0.0011 0.0438 -0.1063 0.1042 -1 -0.0438 NA NA NA 0 0.0843 NA NA NA 1 0.1218 0.0772 -0.0635 0.3071 2 0.1859 NA NA NA 3 -0.0160 NA NA NA 4 0.5317 NA NA NA

Signif. codes: `*' confidence band does not cover 0

Control Group: Not Yet Treated, Anticipation Periods: 0 Estimation Method: Inverse Probability Weighting

Do you know what may be causing the problem and do you have any suggestion on how I could overcome this issue?

Surprisingl I am able to estimate the model using Sun and Abraham (2020) approach, but I would like to include your estimation approach as well

Thanks a lot

— Reply to this email directly, view it on GitHub https://github.com/bcallaway11/did/issues/173, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABE7343G4UWHVNENZOU7HRDXEE2JLANCNFSM6AAAAAAXTJNZYY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

--

Pedro H. C. Sant'Anna https://psantanna.com https://psantanna.com

Ales-G commented 1 year ago

Dear Pedro, first of all thank you very much for your prompt reply, this is amazingly generous of you.

You are right, it is the controls that are causing the problem. As I remove them the model are computed correctly. However, I really need to include both for theoretical reasons and to be able to convincingly claim that my parallel trend assumption holds. Do you have any suggestions on how I can deal with the issue

> # dr
> est_did_dr = did::att_gt(yname = "auditrating_pass",
+                       gname = "gname",
+                       idname = "prod",
+                       tname = "yr",
+                       #xformla = ~rsph+audcmp+auditann+audseq,
+                       est_method="dr",
+                       control_group="notyettreated",
+                       panel=T,
+                       allow_unbalanced_panel = T,
+                       pl=T,
+                       cores = 24,
+                       alp=0.05,
+                       data = data %>% filter(anytrai==1))
> agg_dr <- did::aggte(est_did_dr, type = "dynamic", na.rm=T)
> agg_dr

Call:
did::aggte(MP = est_did_dr, type = "dynamic", na.rm = T)

Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 

Overall summary of ATT's based on event-study/dynamic aggregation:  
    ATT    Std. Error     [ 95%  Conf. Int.]  
 0.1363        0.0588     0.0211      0.2515 *

Dynamic Effects:
 Event time Estimate Std. Error [95% Simult.  Conf. Band]  
         -4   0.1553     0.0844       -0.0711      0.3816  
         -3  -0.0349     0.0419       -0.1473      0.0776  
         -2  -0.0002     0.0351       -0.0943      0.0938  
         -1  -0.0892     0.0355       -0.1843      0.0059  
          0   0.1518     0.0364        0.0542      0.2493 *
          1   0.1330     0.0515       -0.0050      0.2710  
          2   0.1584     0.0575        0.0043      0.3125 *
          3   0.0807     0.0966       -0.1783      0.3396  
          4   0.1577     0.1925       -0.3587      0.6742  
---
Signif. codes: `*' confidence band does not cover 0

Control Group:  Not Yet Treated,  Anticipation Periods:  0
Estimation Method:  Doubly Robust
> 
> # ipw
> est_did_ipw = did::att_gt(yname = "auditrating_pass",
+                          gname = "gname",
+                          idname = "prod",
+                          tname = "yr",
+                          #xformla = ~rsph+audcmp+auditann+audseq,
+                          est_method="ipw",
+                          control_group="notyettreated",
+                          panel=T,
+                          allow_unbalanced_panel = T,
+                          pl=T,
+                          cores = 24,
+                          alp=0.05,
+                          data = data %>% filter(anytrai==1))
> agg_ipw <- did::aggte(est_did_ipw, type = "dynamic", na.rm=T)
> agg_ipw

Call:
did::aggte(MP = est_did_ipw, type = "dynamic", na.rm = T)

Overall summary of ATT's based on event-study/dynamic aggregation:  
    ATT    Std. Error     [ 95%  Conf. Int.]  
 0.1363        0.0579     0.0228      0.2498 *

Dynamic Effects:
 Event time Estimate Std. Error [95% Simult.  Conf. Band]  
         -4   0.1553     0.0856       -0.0788      0.3893  
         -3  -0.0349     0.0437       -0.1543      0.0845  
         -2  -0.0002     0.0371       -0.1015      0.1010  
         -1  -0.0892     0.0334       -0.1805      0.0021  
          0   0.1518     0.0349        0.0564      0.2472 *
          1   0.1330     0.0494       -0.0020      0.2681  
          2   0.1584     0.0567        0.0035      0.3133 *
          3   0.0807     0.0952       -0.1796      0.3409  
          4   0.1577     0.2006       -0.3905      0.7060  
---
Signif. codes: `*' confidence band does not cover 0

Control Group:  Not Yet Treated,  Anticipation Periods:  0
Estimation Method:  Inverse Probability Weighting

With regards to Sun and Abraham, you are right these are very different approaches, and I love the flexibility of your methodology. For reference, I use the eventstudyinteract command in Stata but one can also use fixest with sunab() function in R. Both commands allow to include additional controls. I hope I am not missing something big.

thanks a lot

bcallaway11 commented 10 months ago

@Ales-G , apologies for the delayed response. I think the first error message that you reported is likely correct --- that the overlap condition is likely violated. That said, if you want things to "just run" while including the same options, your best bet is likely to be to set est_method="reg"