bcallaway11 / did

Difference in Differences with Multiple Periods, website: https://bcallaway11.github.io/did
287 stars 91 forks source link

Effect of splitting panel IDs, making the panel more unbalanced? #179

Open PaoloZacchia opened 11 months ago

PaoloZacchia commented 11 months ago

Hello,

thanks everyone (and the authors in particular) for the great package, it really advances research. I have an issue which is perhaps very simple to answer, but I apologize in advance for the verbose description (I want to make sure it's clear).

I have a fairly (not exceedingly) unbalanced panel of firms on which I run estimation with the option allow_unbalanced_panel = TRUE. The estimation works fine. Realizing that some of my firms IDs really refer to different corporate entities at different points in time, I redefine my original id_firm into a new jd_firm with finer-grained identifiers. What I want is basically to avoid that the original IDs that change into a different corporate entities when treatment occurs enter the estimation, because conceptually they would not belong neither to my treatment nor to my control group.

However, I am under the impression that so long I keep the option allow_unbalanced_panel = TRUE, this ID splitting is inconsequential: firms are still assigned in each year to either the treatment or the control group, pre and post the treatment, depending on the cohort they are assigned to. And in fact, estimation is only mildly affected (if at all, I need to check with my coauthor). Is this the case? Or is this independent of the allow_unbalanced_panel option?

What I want is basically to "get rid" of these firms that change nature too close in time to when their treatment occurs. Naturally, I could manually exclude them from the estimation dataset, but this is error-prone. I thought that perhaps the package allows for a more elegant solution, perhaps an implicit one that at the moment I don't understand all too well.

Thanks for reading this! :-)

bcallaway11 commented 9 months ago

Hi Paolo, sorry for the delay getting back to you -- I had read this a while back, but I wasn't sure what to say, and then it slipped out of my mind. I don't think that I fully understand what you are going for, but I think my (tentative) recommendation would be to go ahead and clean the data outside of our code. There could perhaps be a some kind of workaround using our code, but I think it would be safer for you to just do it yourself on the outside.

Brant