Open albertostefanelli opened 5 years ago
I am also encountering the same problem as you. Do you fix out the problem?
Unfortunately not. My guess is that setx draws a higher number of values when imputing the counterfactual outcome.
Thank you. I also tried with my data. And I agree with your idea. Now I am trying PSM using Stata
From: Alberto Stefanellimailto:notifications@github.com Sent: Thursday, February 6, 2020 9:24 AM To: IQSS/Zeligmailto:Zelig@noreply.github.com Cc: hezhichao1991mailto:zhichao_he@outlook.com; Commentmailto:comment@noreply.github.com Subject: Re: [IQSS/Zelig] impossible to reproduce ATT() results with the setx() and sim() (#328)
Unfortunately not. My guess is that setx draws a higher number of values when imputing the counterfactual outcome.
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/IQSS/Zelig/issues/328?email_source=notifications&email_token=AON6YQV5ZEOD7KSXFTLMCELRBPCKPA5CNFSM4G6SIYH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK6KTLQ#issuecomment-582789550, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AON6YQT4GQM75FGA3VW2YCDRBPCKPANCNFSM4G6SIYHQ.
Although the results are not extremely different, the results of the ATT function implemented in the Zelig package are not reproducible with the Zelig setx and sim functions. The example provided uses a matching design done with the MatchIt package. The outcome of interest is support for social spending (servicies_res), the treatment variable W is race. R script and data attached.
Snap of the dataframe.
Running the native Zelig ATT function. The mean difference between those actually treated or exposed and their counterfactuals is
0.7060121
Let's now try to get the ATT using the parametric method suggested in the MatchIt package. Now the ATT (so-called first difference in the sim output) is equal to
0.7049261
. The predicted y mean equals3.796565
for the control and4.501491
for the treated. As far as i can asses from the syntax, the estimation should be the same of what the ATT function does since in both cases the model is estimated using the entire sample with the treatment variable (race) as a predictor. However, even with the same seed, the results are slightly different.It might be that the ATT function uses a different approach. The MatchIt package suggest to (1) run two different models one for the control and the treatment group, (2) simulating the mean value of y imputing the value of the covariates of the reference group and (3) taking the mean difference of y_treatment - y_control.
First, the coefficients estimated from the control group are combined (imputed) with the values of the covariates of the treated units. Here the expected value of y is
3.797495
.Second, the coefficients estimated from the treatment group are combined (imputed) with the values of the covariates of the control units. In this case the predicted mean value of y is
4.497277
Third, the difference between the y mean(s) is our ATT witch, in this case, equals
0.6997821
. If i understood correctly, compared to the parametric estimation before detailed, the small difference in the mean estimates are due to the fact that in this case the estimatation is done not using the entire sample but the control/treatment group separately. However, the estimate is different from both the parametric example (0.7049261) and the ATT Zelig function (0.7060121).(mean(att_control$sim.out[[1]][1][[1]][[1]]) - mean(att_treatment$sim.out[[1]][1][[1]][[1]]))
In addtion, the MatchIt documentation suggests to use the conditional prediction (which means using the observed values) in setx(). Switching the argumentcond
from TRUE to FALSE leads to the same results.github example_r_script.txt github_exemple.RData.txt