IQSS / Zelig

A statistical framework that serves as a common interface to a large range of models
http://zeligproject.org
109 stars 43 forks source link

impossible to reproduce ATT() results with the setx() and sim() #328

Open albertostefanelli opened 5 years ago

albertostefanelli commented 5 years ago

Although the results are not extremely different, the results of the ATT function implemented in the Zelig package are not reproducible with the Zelig setx and sim functions. The example provided uses a matching design done with the MatchIt package. The outcome of interest is support for social spending (servicies_res), the treatment variable W is race. R script and data attached.

Snap of the dataframe.

r$> head(dput(github_exemple,"github_exemple.txt"))                         
   age race edu income gender number_childers servicies_res                 
2   26    0  13     17      1               0             4
4   58    0   9     20      1               0             3
6   60    0  14      1      1               1             3                 
8   56    0   9     28      2               0             2
10  30    0  10     19      2               2             3
11  40    0   7     11      2               0             4   

Running the native Zelig ATT function. The mean difference between those actually treated or exposed and their counterfactuals is 0.7060121

z.out1 <- zelig(servicies_res ~ age + race,
  data = matched, #control
  model = "ls")

set.seed(12)
z.att_treat <- z.out1 %>%
             ATT(treatment = "race",treat=1) %>% 
             get_qi(qi = "ATT", xvalue = "TE")

mean(z.att_treat)

Let's now try to get the ATT using the parametric method suggested in the MatchIt package. Now the ATT (so-called first difference in the sim output) is equal to 0.7049261. The predicted y mean equals 3.796565for the control and 4.501491for the treated. As far as i can asses from the syntax, the estimation should be the same of what the ATT function does since in both cases the model is estimated using the entire sample with the treatment variable (race) as a predictor. However, even with the same seed, the results are slightly different.

z.out1 <- zelig(servicies_res ~ age + race, data = matched,
model = "ls")

set.seed(12)
x.out1 <- setx(z.out1, race=0) #majority 
set.seed(12)
x.out2 <- setx1(z.out1, race=1) #minority
s.out1 <- Zelig::sim(z.out1, x = x.out1, x1=x.out2)
summary(s.out1)

It might be that the ATT function uses a different approach. The MatchIt package suggest to (1) run two different models one for the control and the treatment group, (2) simulating the mean value of y imputing the value of the covariates of the reference group and (3) taking the mean difference of y_treatment - y_control.

First, the coefficients estimated from the control group are combined (imputed) with the values of the covariates of the treated units. Here the expected value of y is 3.797495.

# run the model for the control 
z.out1 <- zelig(servicies_res ~ age, 
  data = control_majority,
  model = "ls")
summary(z.out1)

# set the X using the value of the treated group
# impute the counterfactual outcome for the treatment  group
set.seed(12)
x.out1 <- setx(z.out1, data=treated_minority, cond=FALSE)

# simulate how it would be if the control would have the value of the treatment 
# for all the indipendent variables 
# betas * X for all scenarios
att_treatment <- sim(z.out1, x = x.out1)
summary(att_treatment)

Second, the coefficients estimated from the treatment group are combined (imputed) with the values of the covariates of the control units. In this case the predicted mean value of y is 4.497277

# run the model for the control 
z.out2 <- zelig(servicies_res ~ age, 
  data = treated_minority,
  model = "ls")
summary(z.out1)

# set the X using the value of the treated group
# impute the counterfactual outcome for the treatment  group
set.seed(12)
x.out2 <- setx(z.out2, data=control_majority, cond=TRUE)

# simulate the value of y if if the control would have the value of the treatment 
# for all the independent variables 
# betas * X for all scenarios
att_control <- sim(z.out2, x = x.out2)
summary(att_control)

Third, the difference between the y mean(s) is our ATT witch, in this case, equals 0.6997821. If i understood correctly, compared to the parametric estimation before detailed, the small difference in the mean estimates are due to the fact that in this case the estimatation is done not using the entire sample but the control/treatment group separately. However, the estimate is different from both the parametric example (0.7049261) and the ATT Zelig function (0.7060121).

(mean(att_control$sim.out[[1]][1][[1]][[1]]) - mean(att_treatment$sim.out[[1]][1][[1]][[1]])) In addtion, the MatchIt documentation suggests to use the conditional prediction (which means using the observed values) in setx(). Switching the argument cond from TRUE to FALSE leads to the same results.

github example_r_script.txt github_exemple.RData.txt

hezhichao1991 commented 4 years ago

I am also encountering the same problem as you. Do you fix out the problem?

albertostefanelli commented 4 years ago

Unfortunately not. My guess is that setx draws a higher number of values when imputing the counterfactual outcome.

hezhichao1991 commented 4 years ago

Thank you. I also tried with my data. And I agree with your idea. Now I am trying PSM using Stata

From: Alberto Stefanellimailto:notifications@github.com Sent: Thursday, February 6, 2020 9:24 AM To: IQSS/Zeligmailto:Zelig@noreply.github.com Cc: hezhichao1991mailto:zhichao_he@outlook.com; Commentmailto:comment@noreply.github.com Subject: Re: [IQSS/Zelig] impossible to reproduce ATT() results with the setx() and sim() (#328)

Unfortunately not. My guess is that setx draws a higher number of values when imputing the counterfactual outcome.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/IQSS/Zelig/issues/328?email_source=notifications&email_token=AON6YQV5ZEOD7KSXFTLMCELRBPCKPA5CNFSM4G6SIYH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK6KTLQ#issuecomment-582789550, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AON6YQT4GQM75FGA3VW2YCDRBPCKPANCNFSM4G6SIYHQ.