OHDSI / CohortDiagnostics

An R package for performing various cohort diagnostics.
https://ohdsi.github.io/CohortDiagnostics
40 stars 45 forks source link

Very slow when running CohortDiagnostics 3.1.0 especially when Creating internal concept counts table #1018

Closed cebarboza closed 1 year ago

cebarboza commented 1 year ago

CohortDiagnostics v. 3.1.0 still runs very slow when "Creating internal concept counts table" on the IPCI and CPRD databases. We are aware that there might be room for optimization on the following SQL:

https://github.com/OHDSI/CohortDiagnostics/blob/master/inst/sql/sql_server/CreateConceptCountTable.sql

as explained here:

https://github.com/OHDSI/CohortDiagnostics/issues/517

We are trying right now to optimize PostgreSQL as suggested, but is there a way to bypass or optimize this code, especially since we are running several studies related to the Darwin project.

Thanks!

azimov commented 1 year ago

@cbarbozaerasmus @gowthamrao I can see why this query would be slow on postgres instances - the approach suggested of counting concept sets based on those actually used would probably be preferable and I'd accept a patch that does this. However, I don't have the bandwidth to work on this right now.

If you want to bypass doing this you can turn off the diagnostics for concept sets:

setting this in run diagnostics would achieve this.

runIncludedSourceConcepts = FALSE,
runOrphanConcepts = FALSE,
runBreakdownIndexEvents = FALSE

If you're running the package in incremental mode it would be possible to turn these on later, after you update any database configuration.

I would expect that his might mean you miss something when phenotyping though.

azimov commented 1 year ago

closed due to inactivity