JMSLab / xtevent

Stata package -xtevent-
MIT License
43 stars 12 forks source link

Check behavior of the `cohort` and `control_cohort` options and allow automatic generation #179

Closed jorpppp closed 3 months ago

jorpppp commented 4 months ago
jorpppp commented 4 months ago

In https://github.com/JMSLab/xtevent/commit/c4a4b8f4694e97bfada885b6794faeb322fa38ec I checked for whether xtevent is checking for any type of cohort consistency with the policy variable. Currently it is not checking for consistency. I also wrote down some ideas for changing the syntax.

jorpppp commented 4 months ago

I added consistency checks for the cases when the user provides the cohort and control_cohort variables. I also added a force option to skip those consistency checks should the user want to do so. This would be useful in cases when we want heterogeneous treatment effects by a group variable which is not necessarily a treatment cohort, or when we want to group cohorts together.

I also added automatic cohort variable generation in the staggered adoption case.

I still need to add:

jorpppp commented 3 months ago

In https://github.com/JMSLab/xtevent/commit/17134e35caf61a007493a8f6eaad2a7c5dd1ae76 I added the option of creating the control_cohort variable automatically based on missing value of the cohort variable. The option to input a control_cohort indicator is still available.

If the control_cohort variable is created automatically, there are options to save it and replace it. At the end I did not add options to name these cohort variables because we do not have options to name imputed policy variables yet and it felt assymetric. We can enable naming of created variables in another issue.

All of these options for automatic generation only work under staggered adoption. One pending issue is automatic generation of the cohort and control_cohort variables case when the user wants to use the last treated cohort as the control variable. By default this cohort would not have missing values of treatment cohort, and as such would not be used as the control_cohort with this automatic generation. But this can be done manually entering the variables for treated and control cohorts.

jorpppp commented 3 months ago

Per call: We won't deal with automatic generation of the cohort and control_cohort variables when the user wants to use the last treated cohort as the control. We may add an example of this somewhere in the repository or the paper.

We will check if our implementation of Sun and Abraham allows for the last treated cohorts to contribute to the estimates even if they do not have observations for the entire estimation window.

jorpppp commented 3 months ago

@SimonFreyaldenhoven The following example shows that late-treated cohorts that do not have a number of post-treatment periods equal to estimation window on the right are still included in the estimation:

clear all
use "example31.dta", clear
gen timet=t if z==1
by i: egen time_of_treat=min(timet)
gen never_treat=time_of_treat==.
xtevent y eta , policyvar(z) window(5) vce(cluster i) impute(nuchange) savek(ev)
tab ev_evtime if time_of_treat == 19

This returns event-times for the cohort treated at time=19. This cohort only has two post-treatment periods:

image

Now we estimate with Sun and Abraham's estimatior and check the estimation sample:

xtevent y eta , policyvar(z) window(5) vce(cluster i) impute(nuchange) sunabraham
gen sample = e(sample)
tab sample if time_of_treat == 19, m

image

This shows that cohort 19 is included in the estimation sample, despite having only two post-treatment periods.

I am going to go ahead and start a PR for this issue. @SimonFreyaldenhoven if you have more concerns about this we can follow up in the PR.

jorpppp commented 3 months ago

The front page examples will need to be modified to reflect the new Sun and Abraham syntax, but we should wait until a new release to do this. I started #184 for this.

jorpppp commented 3 months ago

Summary: In this thread we updated the cohort and control_cohort options to enable consistency checks with the policy variable and automatic generation.

Thread continues in #185