JuliaDiffinDiffs / DiffinDiffs.jl

A suite of Julia packages for difference-in-differences
MIT License
37 stars 2 forks source link

Collaborate? #2

Open nilshg opened 2 years ago

nilshg commented 2 years ago

Hey, good to see someone else working on modern causal inference in Julia!

I'm the author of SynthControl and TreatmentPanels, two packages in a similar space.

With TreatmentPanels I'm trying to build a foundational "data prep" package which takes in a table and a treatment assignment and then constructs an object with a type which tells you whether the panel is balanced/unbalanced, single/multi-unit treatment, and whether the treatment is absorbing or switches on and off. It then provides functions to extract e.g. pre- and post-treatment outcomes, treatment periods and IDs of treated units etc.

In SynthControl I'm trying to pull together a bunch of recent methods in this space - starting from the most simple "just use all pretreatment outcomes" case to the classical Abadie/Diamond/Hainmueller and things like Synthetic Diff-in-Diff and Matrix Completion.

Finally I've also started implementing Sant'Anna/Zhao's DRDID, although that's not public yet (need to check licensing on that).

Maybe have a look at my stuff and see if any of it is useful or if you'd like to collaborate on anything!

junyuan-chen commented 2 years ago

Hi, thanks for reaching out! Your work looks interesting.

Regarding handling the panel structure, I actually took a different approach that is similar to how GroupedArrays.jl works, which used to be a component in FixedEffects.jl. The key advantage of this approach is on performance. There is no need to repeatedly search through the data columns (findfirst inside for loops) for the positions of distinct combinations of treatment assignment and calendar time, etc. Instead, we first label each vector using a Dict in a way similar to how PooledArrays.jl works to obtain vectors consisting of positive integers assigned to each unique value within each column. With these transformed vectors, it is possible to do vector multiplications in a way such that the multiplication results will give us the "labels" for the distinct combinations. From that, the row indices of the distinct combinations can be obtained using a standard algorithm of grouping data with the help of Dict.

The above algorithm that I briefly outlines is something that I believe that has among the best performance for multi-level grouping (when more heavy machinery like multithreading is not needed) that is also relatively simple to implement. You might want to take a look at how I define findcell here if you are interested.