Daniel-Pailanir / sdid

Synthetic Difference in Differences for Stata
GNU General Public License v3.0
72 stars 36 forks source link

Very slow in bootstrap replications with datasets of more than 100,000 obs. #68

Closed Machine855 closed 3 months ago

Machine855 commented 3 months ago

Can you please change the STATA code so you can run it in Parallel? Parallel execution -> parallel (#) partial out variables in # separate Stata processes, speeding up execution depending on data size and computer characteristics. Requires the parallel package.

The code runs very slow even I am using a laptop with 128 gb physical memory and 16 logical processors. I have a dataset of 270,000 observations. If I run your code with 4 covariates and request 1,000 bootstrap replications (standard number of replications) than it would take me 40 days to complete the estimation.

damiancclarke commented 3 months ago

This is a nice idea, but unfortunately probably not something we have capacity for at the moment... We're very happy to review pull requests if you want to try it of course!

As an alternative solution, perhaps it would make sense to use parallel bs (bootstrap) where you do your own clustered resample, and in each resample just estimate sdid with the no-inference option, which will allow you to generate many bootstrapped ATT estimates from sdid which you could use to arrive to the standard error you need?