bcallaway11 / did

Difference in Differences with Multiple Periods, website: https://bcallaway11.github.io/did
292 stars 94 forks source link

Error in solve default: system is computational singular #64

Open KarolinaHelena opened 3 years ago

KarolinaHelena commented 3 years ago

Hello,

Thanks for a great package!

I have a question with regards to an error I get when I use did, I have used it for other outcome variables and it works great. However, this outcome variable has quite a few missing values, which my other data does not have, and I get an error when I want to use the DR estimation with covariates, but I don't get the error when I use DR with no covariates or if I use the REG or IPW methods (both with covariates). I have tried the DR estimation with different combinations of covariates, and it doesn't matter whether I use just one or any combination of them, it still gives me the error.

The error I get: Error in solve.default(crossprod(wols.x.pre.treat, int.cov)/n) : system is computationally singular: reciprocal condition number = 3.77735e-23

In the estimation I allow for unbalanced panel, however, if I do not allow for this the regression runs but it cannot estimate any ATTs (overlap condition is violated in every period except the very first year in the data).

My question is whether I can solve this and if so how?

Thanks again for a great package!

bcallaway11 commented 3 years ago

Hi Karolina,

Sorry for the slow response here. If you are still experiencing any issues related to this, let me know.

Brant

gorkembostanci commented 3 years ago

Hi Brant,

I am having the same issue. I believe it comes from the solve function in the following line.

W <- nt(preatt)%%solve(preV)%*%preatt

Best,

bcallaway11 commented 3 years ago

Yes, that is likely the line where this error comes up.

A few questions for you:

Brant

gorkembostanci commented 3 years ago

It crashes the code, so no output from att_gt.

By the way, I also get the `small groups' warning in my estimations, although that itself does not create a crash with 'reg' or 'ipw' options.

Best,

bcallaway11 commented 3 years ago

Ok, I'm not sure what is causing this.

Would you be up for sending me an example that can reproduce this? If you'd rather email about this than post here, my email is brantly.callaway@uga.edu

Brant

gorkembostanci commented 3 years ago

Hi Brant, just sent you the files.

Best,

bcallaway11 commented 3 years ago

Hi Gorkem,

I think I've gotten to the bottom of this now. There is a longer answer here, but the short version is that we use the DRDID package internally to compute ATT(g,t) for particular values of (g,t). With an unbalanced panel, our call to DRDID is for the efficient doubly robust estimator from Sant'Anna and Zhao (2020). This involves estimating a regression using treated observations. But, in your case, the number of treated observations in some groups is quite small, and, in practice, the error comes from trying to run a regression of two observations on five covariates.

On the other hand, if you use est_method="reg" or est_method="ipw", there is no regression using treated units. This is the reason you don't run into the error there.

We will have to update the code in order to provide an actual solution (there are alternative doubly robust approaches that don't require the regression using treated observations and this is what would work best in your case). I'll update here once we have it, but it may take me a little bit of time.

Brant

gorkembostanci commented 3 years ago

I see, that clarifies a lot. Thank you for the explanation!

Best of luck with the updates going forward.

bcallaway11 commented 2 years ago

An update on this one (also tagging @pedrohcgs):

Here is some related code that can re-produce this error with simulated data.

library(did)
sp <- reset.sim(time.periods=3)
data <- build_sim_dataset(sp, panel=FALSE)
data$X2 <- rnorm(nrow(data))
data$X3 <- rnorm(nrow(data))
dropids <- unique(subset(data, G==2 & period==2)$id)
dropids <- dropids[-c(1,2,3)] # keep three observations from group 2
data <- subset(data, !(id %in% dropids))

# internally this calls drdid_rc
# and crashes
res_dr <- att_gt(yname="Y", xformla=~X+X2+X3, data=data, tname="period", idname="id",
                 gname="G", est_method="dr", panel=FALSE)
#> Error in solve.default(crossprod(wols.x.post.treat, int.cov)/n): system is computationally singular: reciprocal condition number = 3.52816e-17

# setting `est_method=DRDID::drdid_rc1` will run in this case though
res_dr <- att_gt(yname="Y", xformla=~X+X2+X3, data=data, tname="period", idname="id",
                 gname="G", est_method=DRDID::drdid_imp_rc1, panel=FALSE)
res_dr
#> 
#> Call:
#> att_gt(yname = "Y", tname = "period", idname = "id", gname = "G", 
#>     xformla = ~X + X2 + X3, data = data, panel = FALSE, est_method = DRDID::drdid_imp_rc1)
#> 
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna.  "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> 
#> 
#> Group-Time Average Treatment Effects:
#>  Group Time ATT(g,t) Std. Error [95% Simult.  Conf. Band]  
#>      2    2   2.1771     0.2853        1.4943      2.8598 *
#>      2    3   1.4091     0.1567        1.0341      1.7841 *
#>      3    2  -0.0701     0.1712       -0.4798      0.3395  
#>      3    3   1.2482     0.1640        0.8558      1.6405 *
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#> 
#> P-value for pre-test of parallel trends assumption:  0.67106
#> Control Group:  Never Treated,  Anticipation Periods:  0
taiwoakinyemi commented 2 years ago

Hi Brant, I am having a similar error. First, my data is cross-sectional, and using never treated gives an error "Error in pre_process_did(yname = yname, tname = tname, idname = idname, : There is no available never-treated group". To by-pass the above error I substituted "nevertreated" with "notyet" and the resulting error is below. "Error in solve.default(crossprod(wols.x.post.treat, int.cov)/n) : system is computationally singular: reciprocal condition number = 2.22857e-23 In addition: Warning message: In pre_process_did(yname = yname, tname = tname, idname = idname, : dropped 1908 rows from original data due to missing data" Thereafter, I tried to use the code provided in the thread. res_dr <- att_gt(yname="wealthscore", xformla=xformla, data=nigeria, tname="pyear", idname="state_num", control_group="notyet", gname="syear", est_method=DRDID::drdid_imp_rc1, panel=FALSE) res_dr Please find below the accompany error. Error in wols_rc(y, post, D, int.cov, ps.fit, i.weights, pre = TRUE, treat = FALSE) : Outcome regression model coefficients have NA components. Multicollinearity (or lack of variation) of covariates is a likely reason In addition: Warning messages: 1: In pre_process_did(yname = yname, tname = tname, idname = idname, : dropped 1908 rows from original data due to missing data 2: In pscore.cal(D, int.cov, i.weights = i.weights, n = n) : Used glm algorithm to estimate propensity score as trust and IPT method did not converge 3: In pscore.cal(D, int.cov, i.weights = i.weights, n = n) : Used glm algorithm to estimate propensity score as trust and IPT method did not converge. I would appreciate if you can help me look into this.

pedrohcgs commented 2 years ago

Please check if you do have variation in covariate values so you do jave good overlap for each cohort. Also, make sure you dont have missing values.

If there is no variation in covariate values for a given cohort (say, almost all units in that cohort have similar covariates and nobody in the comparison group have similar values), there is not much we can do as that is a problem with the design matrix (and overlap).

On Thu, Jan 13, 2022 at 05:12 taiwoakinyemi @.***> wrote:

Hi Brant, I am having a similar error. First, my data is cross-sectional, and using never treated gives an error "Error in pre_process_did(yname = yname, tname = tname, idname = idname, : There is no available never-treated group". To by-pass the above error I substituted "nevertreated" with "notyet" and the resulting error is below. "Error in solve.default(crossprod(wols.x.post.treat, int.cov)/n) : system is computationally singular: reciprocal condition number = 2.22857e-23 In addition: Warning message: In pre_process_did(yname = yname, tname = tname, idname = idname, : dropped 1908 rows from original data due to missing data" Thereafter, I tried to use the code provided in the thread.

res_dr <- att_gt(yname="wealthscore", xformla=xformla, data=nigeria, tname="pyear", idname="state_num", control_group="notyet", gname="syear", est_method=DRDID::drdid_imp_rc1, panel=FALSE) res_dr Please find below the accompany error. Error in wols_rc(y, post, D, int.cov, ps.fit, i.weights, pre = TRUE, treat = FALSE) : Outcome regression model coefficients have NA components. Multicollinearity (or lack of variation) of covariates is a likely reason In addition: Warning messages: 1: In pre_process_did(yname = yname, tname = tname, idname = idname, : dropped 1908 rows from original data due to missing data 2: In pscore.cal(D, int.cov, i.weights = i.weights, n = n) : Used glm algorithm to estimate propensity score as trust and IPT method did not converge 3: In pscore.cal(D, int.cov, i.weights = i.weights, n = n) : Used glm algorithm to estimate propensity score as trust and IPT method did not converge. I would appreciate if you can help me look into this.

— Reply to this email directly, view it on GitHub https://github.com/bcallaway11/did/issues/64#issuecomment-1012227969, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABE73445U7ECNGQIF25YHCLUV3TXPANCNFSM452QKZGA . You are receiving this because you were mentioned.Message ID: @.***>

--

Pedro H. C. Sant'Anna Department of Economics Vanderbilt University 615-875-8448 (phone) @.*** https://pedrohcgs.github.io