JMSLab / xtevent

Stata package -xtevent-
MIT License
43 stars 12 forks source link

Estimation sample with unbalanced panel #151

Closed ny1526 closed 1 year ago

ny1526 commented 1 year ago

Hi,

Thank you for creating this package. It has been very helpful. I hope it is ok for me to post a question here:

I am finding it difficult to understand how xtevent determines which observations to include in the estimation sample. I have an unbalanced panel and my policy variable is a binary indicator. As an example, I am running xtevent with a 3-period window. I use all the default options.

After running the regression, I check the estimation sample with e(sample). I find that the sample only includes observations between period -8 and 8, even though I have observations going back as far as -12 and as far ahead as +13.

I also find that for some individuals, even if I observe them in all periods between -8 and 8 without gaps, not all observations are included in the sample. For example, I have an individual where only the observations between periods -8 to -1 are included in the estimation sample, even though I have observations for that same individuals for periods 0+.

Can you help me understand what is going on here with the estimation sample? Thank you.

jorpppp commented 1 year ago

Thank you for posting @ny1526. @Constantino-Carreto-Romero are taking a look at this. In the meantime, have you seen the detailed description of the imputation options? It may answer some of the questions you have, particularly about why some observations are excluded.

Constantino-Carreto-Romero commented 1 year ago

Dear @ny1526, thanks for using -xtevent-. Here are some suggestions that may help:

As an example, I am running xtevent with a 3-period window. I use all the default options.

Please make sure you are using the -impute- option. Please, see here for a detailed example about the -impute- option and how to use it. Using the -impute- option is important when the estimation window is narrow.

I also find that for some individuals, even if I observe them in all periods between -8 and 8 without gaps, not all observations are included in the sample. For example, I have an individual where only the observations between periods -8 to -1 are included in the estimation sample, even though I have observations for that same individuals for periods 0+.

Maybe you are using a package version with the problem pointed out in issue 126. Please, make sure to you are using the latest version of -xtevent-, which no longer has this problem. To install the latest version, you can follow the instructions on the home page.

ny1526 commented 1 year ago

Thank you both, @jorpppp and @Constantino-Carreto-Romero for pointing me to the impute option. Just to clarify: I do not have any missing values in my policy variable, unless it is considered "missing" if there are some periods where I have observations for some individuals but not others (and so I do not have values for the policy or outcome variable in those periods.) Should the impute option be used in that scenario? And if so, can you help me figure out which impute option I should use? (My policy is staggered adoption.)

I am using the latest version of xtevent.

Thank you again for your help.

Constantino-Carreto-Romero commented 1 year ago

Thank you @ny1526. Even if you don’t have missing values in your policyvar, xtevent has to create leads and lags of the policyvar. This may introduce missing values for these leads and lags at the beginning and the end of the time window you might be using, unless the window is short enough. xtevent 's default behavior is not to assume anything about those missing values, to make the user make a conscious choice about them. If your policyvar follows staggered adoption, then you should use impute(stag). With this option, xtevent will check if your policyvar indeed follows staggered adoption, and if so, it will impute those missing values in the leads and lags of the event-time dummies. The impute(stag)option assumes that in periods before your window’s left endpoint, the policyvar has the same value as the first observed value in your policyvar, and that in periods after your window’s right endpoint, the policyvar has the same value as the last observed value in your policyvar. Please, see the help file for a detailed explanation of the -impute(stag)- option and its differences from the other imputation rules. This will surely increase the number of observations included in your regression.

ny1526 commented 1 year ago

Thank you for the explanation, @Constantino-Carreto-Romero , and apologies for the delayed response.

I added the "impute(stag)" option as you suggested and this seems to partially fix my issue. However, I am still have some cases where I do not understand why observations are dropped.

For example, when I run the regression with the 3-period window, I get have an individual where I have observations for periods -2 through +3 in my data, but the regression only includes the observations from the periods in -2, -1, 0, and +3, skipping +1 and +2. There are several examples of this where observations for individuals for some periods appear to be "skipped." There is no apparent reason to me why they would be dropped (e.g., the observations are not missing values for the covariates in the model.) Do you know why this might be?

Thank you again for your help.

Constantino-Carreto-Romero commented 1 year ago

Dear @ny1526, Besides missing values in the covariates for those periods, there might also be missing values in the dependent variable. Another possibility might be gaps in your time variable. If so, xtevent will translate the missing periods into missing values in the event-time dummies. If none of these apply to your data, would you please provide us with a code snippet and a small sample of your data or some screenshot where we can look at your dataset?

jorpppp commented 1 year ago

Hi @ny1526, just following up in case you're still trying to figure this out.

ny1526 commented 1 year ago

Hi @jorpppp , apologies for not responding earlier. I think I figured out what the issue was with my data. Thank you all for your help!