Closed jorpppp closed 3 months ago
Note that repeated saving of imputed variables with different suffixes is not working:
clear all
webuse nlswork
by idcode (year): gen time=_n
xtset idcode time
by idcode (time): gen union2 = sum(union)
replace union2 = 1 if union2 > 1
xtevent ln_w age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure ///
, pol(union2) w(3) cluster(idcode) impute(nuchange, saveimp)
ren union2_imputed union2_imputed1
gen s = e(sample)
xtevent ln_w age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure if s ///
, pol(union2) w(3) cluster(idcode) impute(nuchange, saveimp)
returns an error message.
This is because we are confirming vars with unab
, so I'm going to fix that first.
@Constantino-Carreto-Romero note the issue above https://github.com/JMSLab/xtevent/issues/203#issuecomment-2111160274 and the fix in https://github.com/JMSLab/xtevent/commit/548b1a6119e73ed0977ecb99928c0d8854816d2b. I think this is also an issue for #177 that we just closed, because it means that when xtevent
drops variables to replace them, it may be dropping more variables that it should.
Can you please open a new issue to check all confirms to make sure this doesn't happen? Note that for the fix in #177 you may need to move the check for replacing to a later place in the code, when the code already knows the new kvars and can confirm them individually.
After https://github.com/JMSLab/xtevent/commit/548b1a6119e73ed0977ecb99928c0d8854816d2b the above code works: https://github.com/JMSLab/xtevent/issues/203#issuecomment-2111160274
and it shows that the imputed variable is different across runs, so the issue has to do with the imputation.
Actually the issue may be deeper since it shows up without any imputation:
clear all
webuse nlswork
by idcode (year): gen time=_n
xtset idcode time
by idcode (time): gen union2 = sum(union)
replace union2 = 1 if union2 > 1
xtevent ln_w age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure ///
, pol(union2) w(3) cluster(idcode)
est sto m1
xtevent ln_w age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure if s ///
, pol(union2) w(3) cluster(idcode)
est sto m2
est tab m1 m2
This is not an imputation issue, and I don't think this is a bug. This code checks the estimation samples for the two runs:
clear all
webuse nlswork
by idcode (year): gen time=_n
xtset idcode time
by idcode (time): gen union2 = sum(union)
replace union2 = 1 if union2 > 1
xtevent ln_w age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure ///
, pol(union2) w(3) cluster(idcode) savek(a)
est sto m1
* ren union2_imputed union2_imputed1
gen s1 = e(sample)
xtevent ln_w age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure if s1 ///
, pol(union2) w(3) cluster(idcode)
est sto m2
gen s2 = e(sample)
* ren union2_imputed union2_imputed2
* assert union2_imputed1 == union2_imputed2
est tab m1 m2
tab s1 s2
tab idcode if s1 != s2
This shows that the samples differ between runs and it lists some ids for where this happens. These ids are the same that are excluded in the second run because of ambiguous event times.
After that if we check the sample for missing values for one of the ids:
li idcode time union2 tenure age ttl_exp s1 s2 if idcode == 13
we get
Note that tenure is missing precisely when the policy variable, union2, changes from zero to one. So in the first run, this observation is excluded because of missing tenure. Now, the second run that is restricted to the estimation sample of the first run. Since that observation is not in the sample for the second run, the observations for id=13 have ambiguous event times so the entire unit is excluded. The same happens for the other ids in the sample.
The same happens with the other ids with the sample.
I'll now check the imputation case.
It's a good idea to return the list of excluded units due to ambiguous event times so they can be checked. I added this in https://github.com/JMSLab/xtevent/commit/3a84461121c8f2723d551ab3138cd9205779b3ea and https://github.com/JMSLab/xtevent/commit/7afc04c2c63ff17cfdf53950e0e2f3ac68708815
After doing this I checked if the excluded units due to ambiguous event time when using impute(nuchange) (a larger set than without imputation) had the same issue of missing tenure when the policy activates, and they do. So the issue is the same overall.
In summary I think this is not a bug.
@Constantino-Carreto-Romero could you read the posts on this issue and let me know if you agree? If so we can start a PR to add these changes.
It's a good idea to return the list of excluded units due to ambiguous event times so they can be checked. I added this in 3a84461 and 7afc04c
After doing this I checked if the excluded units due to ambiguous event time when using impute(nuchange) (a larger set than without imputation) had the same issue of missing tenure when the policy activates, and they do. So the issue is the same overall.
In summary I think this is not a bug.
@Constantino-Carreto-Romero could you read the posts on this issue and let me know if you agree? If so we can start a PR to add these changes.
@jorpppp
I replicated the examples throughout the issue and I agree it should not be considered as a bug. In the first run, to create the event-time dummies, the program focuses on a larger sample, but then estimates restricting to non-missing observations in the varlist
. If the user wants to reestimate using e(sample) he/she might not be able to run the same estimation because the program won’t be able to define event time for some units that lost key observations in the previous run. Maybe we could add a second sample function, one that focuses on the sample used when creating the event-time dummies.
@Constantino-Carreto-Romero Yes, this makes me think that we just need to set the sample to the one used when creating the event-time dummies. That way the results will replicate using e(sample), and at the end of the day, that is the sample that xtevent
is using as a whole, even if _eventols
or _eventiv
use smaller samples. I'll implement.
Summary: In this issue we addressed an issue with xtevent
giving different results when ran twice, using the e(sample) from the first run in the second run.
Thread continues in #205
Consider this example
The second run over the estimation sample from the first run leads to different results. This probably has to do with the impute option.
I am going to label this as a bug for now because it looks like it.