One dummy vs two dummies

vob2 commented 4 years ago

Hi Jie and Vibhuti,

I wanted to ask you about the specifications we are using with two dummies (\gamma1 D + \gamma2 NotD) vs more traditional (gamma1 + gamma2 D)

What are the downsides? E.g., if both both coefficients are statistically significant, can we say that D produces effect different from NotD? In other words, can we tell of the two dummy variables have different effects?

Thanks,

Vlad

vibhuti6 commented 4 years ago

Hi Vlad, the benefit of including two dummies is that it allows us to directly get the main effect for the each category. Here's a simple example where say we have two categories: high school or college graduates.

Case 1: One dummy Earnings= \gamma_1 + \gamma_2 CollegeGraduate HighSchool earnings = \gamma1 CollegeGraduate earnings = \gamma1 + \gamma2

Now, \gamma2 tells us how much more college graduates earn than high school graduates. Overall earnings of college graduates is \gamma1+\gamma2

Here, if \gamma2 is significant, we can say that college graduates earn significantly more than high school graduates.

Case 2: Two dummies Earnings= \gamma_1 HighSchool + \gamma_2 CollegeGraduate HighSchool earnings = \gamma1 CollegeGraduate earnings = \gamma2

Significance of \gamma1 and \gamma2 here will simply tell us that graduating high school (or college) substantially increases earnings. But if we want to check that the difference in earnings between the two graduates is statistically significant, we will have to run a hypothesis test separately.

Please let me know if you have any comments or suggestions regarding this. Thanks.

JNing0 commented 4 years ago

To add to Vibhuti's answer, using two dummies rather than a baseline and a dummy, i.e., the traditional approach, has an expository advantage. In the traditional approach, the baseline treatment effect is wrt to the control whereas the estimate for the dummy is wrt to the baseline treatment effect. I feel that it is clearer and more logically consistent if all estimates are simply treatment effects wrt to their own control.

As for whether the estimates are significantly different, a quick check would be to compare the point estimates accounting for the estimated std. dev.

vob2 commented 4 years ago

Thank you very much for very clear explanations! One thing is still unclear to me. Don't we want to be able to tell (using Vibhuti's example) that CollegeGrad earnings > HighSchool Earnings? In our case, the fact that if a firm receives financial assistance the effect of treatment is greater than if the firm does not?

If we have to run a separate test to establish significance, does this obscure our message? Normally readers would be able to glance this from the significance of coefficients in a table, but now they would have to run a test or look up a separate table.

vibhuti6 commented 4 years ago

Thanks, Vlad and Jie.

Vlad: To your point, I was actually thinking the same while writing this yesterday. The two dummies specification would have been ideal if our estimate for “non-financed” projects was insignificant. That way, the coefficient on dummies would have given us the respective treatment effects, and also fit our argument. So, for example, we could have said that Quickpay only delayed contracts that received financing.

However, our results show that there is always a baseline effect of Quickpay, regardless of other project characteristics. So it’s probably better to use the “one-dummy” specification to argue for the effect being greater for financed projects.

vob2 commented 4 years ago

I am closing this. The takeaway is that one dummy specification is more useful for us.

QuickPay-Operational-Performance / Data-and-code

One dummy vs two dummies #51