alexCardazzi / lpdid

MIT License
13 stars 5 forks source link

Rewrite #7

Open kgcsport opened 4 months ago

kgcsport commented 4 months ago

Hi @alexCardazzi and @zachporreca,

Thank you for putting together this package. I really appreciate having base code to work with instead of coding the paper from scratch.

I rewrote the lpdid function in my fork using data.table and apply statements instead of for loops with subsetting. Write now it is called lpdid_dt, in order to let me do side-by-side comparisons easily while testing code.

On my larger dataset (~130K units with minimum 17 periods each) the original version took 38 hours, but my version returns results in 7 minutes. On smaller datasets like the simulated data in the vignette, the performance gain is much smaller (e.g. 6 minutes versus 1.5 minutes). There's some memory tradeoffs -- I duplicate the data to have a version I can modify in place while preserving the original data, but it worked well and replicated the base case of a single absorbing treatment.

It still needs some polishing and testing (e.g. the nonabsorbing lags don't work yet), but I figured I'd give you a heads up before I pull request in the near future.

Also, @alexCardazzi, I think we briefly met at Easterns this year. I'm at Bates with Austin Smith.

Kyle

alexCardazzi commented 4 months ago

Hi @kgcsport,

Thanks for the message. I remember you from EEA, for sure!

Your speed tests look really promising! I will mention -- we are currently working with the authors of the STATA package to harmonize the two implementations. So, I think these DT improvements will be especially helpful once we can get the two packages synchronized. Let's keep in touch about it!