OHDSI / CohortGenerator

An R package for instantiating cohorts using data in the CDM.
https://ohdsi.github.io/CohortGenerator/
11 stars 10 forks source link

Multiple threads question #150

Open alondhe opened 1 month ago

alondhe commented 1 month ago

Hello, I'm curious about the comment here on running multi-threaded cohort generation: https://github.com/OHDSI/CohortGenerator/blob/main/R/CohortConstruction.R#L126-L131

Understandably, a dependency tree would need to be utilized to ensure we handle subsets. But I'm wondering if there are other challenges to implement it. I think it'd be a huge efficiency gain if we could parallelize.

anthonysena commented 1 month ago

Hey @alondhe - my intent with the generateCohortSet was to support some mode of parallelization for cohort generations hence why you see the reference to ParallelLogger in that function. I'm now unsure if putting this into the package is the right approach and instead allowing some calling process to work out the parallelization. Adding to that, the sub-setting functionality does make it a more complex operation to parallelize since it would require that you know that all dependencies are generated ahead of generating the subsets.

If there is interest in exploring ways to parallelize this, we can discuss it here.

alondhe commented 1 month ago

I am interested for sure. So at this point, all cohort generation via CohortGeneration is in serial (1 thread)? Are there any projects in OHDSI you know of that have handled parallel calls to CohortGenerator?

anthonysena commented 1 month ago

I am interested for sure. So at this point, all cohort generation via CohortGeneration is in serial (1 thread)?

Correct.

Are there any projects in OHDSI you know of that have handled parallel calls to CohortGenerator?

Not that I am aware of. This is a bit tricky too since the parallelization depends on your CDM RDBMS utilization - you don't want to overburden your DB and slow down all of the work.