chaisemartinPackages / did_multiplegt_dyn

|| Stata | R || Estimation of event-study Difference-in-Difference (DID) estimators in designs with multiple groups and periods, and with a potentially non-binary treatment that may increase or decrease multiple times.
36 stars 8 forks source link

Triple diff-in-diff with individual-level data? #82

Closed PeterGencer closed 11 hours ago

PeterGencer commented 3 months ago

Dear Team,

I had a question about Triple diff-in-diff with did_multiplegt_dyn.

The FAQ states about this:

_❓ Can I perform triple difference-in-differences with the command?

Yes. Suppose for instance your third difference is across men and women in the same (g,t) cell. Then, for each (g,t) cell, you just need to compute the difference between the average outcome of men and women in cell (g,t). Then, you simply run the command with this new outcome._

Now, if I get it right, the command is designed with group-level panel data in mind (hence the g notation), but it may also be used using more disaggregated data, e.g. individual level. In my application, I use individual-level data and want to use an individual-level characteristic (a person's job status) for the triple difference. So my g is an individual person (and t is month).

Now, a person (in my sample) cannot be employed and unemployed in the same month, so I cannot compute the difference between an unemployed and employed person in the same (g,t) cell... Right?

I think I might be getting something wrong here, but I can't seem to figure it out myself. Does the estimation of a triple-difference require group-level data that has been aggregated up from more disaggregated data? Would you have an idea how to best handle my case?

Thanks in advance and kind regards,

Peter

romgoti commented 1 month ago

Dear Peter,

There are 2 ways to implement the triple-diff.

  1. As mentioned in the FAQ, one way is to take as your outcome variable the difference in outcome between your 2 employment status at the group level. This way, the treatment effect will estimate the the treatment effect on the difference between the 2 status. Now, your question is about how to compute this difference and how to define the (g) cell. This will depend on which control variables you want (and can) include. If you were to include no control variables, each group could correspond to one treatment path (D). Units that receive the same treatment at the same time belong to the same group and you can compute the difference between employed and unemployed in this cell (aggregated and weighted properly). Therefore, you need each cell to contain employed and unemployed units otherwise it does not work. Similarly, if you want to keep control variables, you can define groups by their treatment path (D) and covariates (if discrete).

  2. Secondly, one can also use the by() functionality to estimate triple diffs. If you add the option by(emp_status), you will recover the treatment effect for each employment group. The difference in treatment effect between the 2 employment status is your triple diff.

Hope this helps, Romain