bquistorff / synth_runner

A tool to run a pool of synthetic controls, conduct inference, and produce visualizations.
43 stars 27 forks source link

Non-treated unit passed to drop_units_prog #24

Closed khlee42 closed 6 years ago

khlee42 commented 6 years ago

I had an issue with the drop_units_prog option and was wondering if you may have a solution for it.

The data has multiple treated units, and each of them should pool donors differently. So, I tried to use drop_units_prog to select different donor pools for each treated unit. According to the STATA synth_runner document, the argument that is passed to the drop_units_prog is the treated unit set by either trunit() or d(). But when I run the code with trace on, I noticed that the argument passed was not the treated unit I specified but some random unit in the donor pool. I ran this exact setting with different data, including synth_smoking, and confirmed the same issue occurs.

Could you look into this to see if what might have caused the problem?

Thank you!

Neal

Preliminaries

Before submitting an issue, please check (with x in brackets) that you:

Expected behavior and actual behavior

Described what you expected to see and what you actually see

Steps to reproduce the problem

Please include a minimal, complete, and verifiable example. If possible, use system-provided or generated data. Otherwise please link to data so that the example can be verified by others. Format the code with an initial and final line of three backticks(`) for readability (see GitHub's markdown formatting)

System information

bquistorff commented 6 years ago

Since SC is done to all the donor units as well to get the p-values, the drop program will be called both for treated and donor units. If you still think there's an issue can you post your trace output?

khlee42 commented 6 years ago

Thanks for your comment! I looked through the code and found that, as you explained, the same donor pool is used, with different treatment dates, to run the placebo tests for each treated unit. What I wanted is to consider only those control units that were used to form the synthetic control, in running the placebo tests. So, I slightly modified the code to do this, and I think it's working fine.

In doing so, I assumed that in the multiple-treatment cases, using different donor pools for each treated unit is fine. Is it theoretically accurate? Or, should the donor pool be the same to every treated unit?

bquistorff commented 6 years ago

The donor pool can differ by treated unit given the identification assumptions one is making (what units can serve as comparable may differ, there my be "contaminations" that one has to worry about).

khlee42 commented 6 years ago

Thanks. Was there any specific reason why you designed this way so that donor pools cannot be changed?

bquistorff commented 6 years ago

Yeah, if the units are geographic there may be local spill-overs that you want to keep from contaminating your estimate. So you could have the program drop units near the "treated" unit.