adjustment for covariates

bcallaway11 / did

Difference in Differences with Multiple Periods, website: https://bcallaway11.github.io/did

298 stars 95 forks source link

adjustment for covariates #180

Closed YutingYale closed 1 year ago

YutingYale commented 1 year ago

I was wondering if anyone has used the Santanna did package to adjust for baseline covariates in the repeated cross-sectional setting? I got the error "Error in qr.solve((crossprod(wold.x.post.treat,int.cov)/n): singular matrix 'a' in solve". Is it because of the empirical overlap issue? In addition, it seems infeasible to adjust for state-fixed effect in the repeated cross-sectional setting in the package?

shrabasteebanerjee commented 1 year ago

I am having the same issue for panel data, so is one of my students-- the covariates are time invariant for each unit without NA values but we still get this error :( Any ideas for fixes?

bcallaway11 commented 1 year ago

@YutingYale This error message is due to some of the covariates being perfectly collinear. You can try dropping some of the covariates.

@shrabasteebanerjee adjusting for baseline covariates is the default behavior of our approach, so, in principle, this should work off the shelf. If you are getting the same error message, it is the same issue: some of the covariates are perfectly collinear.

Also, all of our estimates are done at the group-time level (to allow for treatment effect heterogeneity), so these could be "local" versions of perfect collinearity

Hope this helps! Brant

YutingYale commented 1 year ago

@YutingYale This error message is due to some of the covariates being perfectly collinear. You can try dropping some of the covariates.

@shrabasteebanerjee adjusting for baseline covariates is the default behavior of our approach, so, in principle, this should work off the shelf. If you are getting the same error message, it is the same issue: some of the covariates are perfectly collinear.

Also, all of our estimates are done at the group-time level (to allow for treatment effect heterogeneity), so these could be "local" versions of perfect collinearity

Hope this helps! Brant

Thank you so much for your response. I really appreciate it. However, I have checked the correlation between the covariates, and they are not perfectly colinear (the highest correlation coefficient is 0.6 between some race variables). Would you mind advising?

bcallaway11 commented 1 year ago

Yes, do you mind pasting in the code that you ran here?

YutingYale commented 1 year ago

Yes, do you mind pasting in the code that you ran here?

Sure. Please see the code below: attgt_all <-att_gt(yname="anyhelp", tname="year", gname="first_treat", xformla= ~ age + edu + male + married + childnumber + non_hispanicwhite + non_hispanicblack + non_hispanicother, data=data, panel=F, clustervars=id,control_group="notyettreated")

The error shows up when I add non_hispanicother. There is no perfect linearity issue in the overall data. But this dummy (non_hispanicother) equals 0 in one particular treated group(first_treat==2000 group) in a few years. Would this be an issue?

Thank you so much in advance for your help with the issue!

pedrohcgs commented 1 year ago

Yes, that could be an issue because if non_hispanicother=1 in the subset of units with first_treat==2000 and those not-yet-treated by time t, I could tell with certainty that the unit would be in the control group. It may also be the case that, in some subset for other units, being non_hispanicother=1 perfectly predicts being in the treated group.

Does this make sense?

YutingYale commented 1 year ago

Got it. This makes sense. I really appreciate your clarification on this issue. Thanks again!

iamlcc commented 9 months ago

@YutingYale could you please share what you did after that?

YutingYale commented 9 months ago

@YutingYale could you please share what you did after that?

I dropped the covariates that may cause the collinearity issue.