OHDSI / ETL-Synthea

A package supporting the conversion from Synthea CSV to OMOP CDM
https://ohdsi.github.io/ETL-Synthea/
98 stars 71 forks source link

Performance for event tables #147

Closed mpreusse closed 11 months ago

mpreusse commented 2 years ago

I haven't used the ETL-Synthea in a while. I tried loading data for 10 patients into a new OMOP CDM database in Postgres. All indexes and constraints are created. The vocabularies are loaded directly to the OMOP database, not through the ETL-Synthea R script.

I run the following part of the documented script:

# Connection ...

ETLSyntheaBuilder::CreateSyntheaTables(connectionDetails = cd, syntheaSchema = syntheaSchema, syntheaVersion = syntheaVersion)

# create and load Synthea native
ETLSyntheaBuilder::LoadSyntheaTables(connectionDetails = cd, syntheaSchema = syntheaSchema, syntheaFileLoc = syntheaFileLoc)

# Synthea ETL
ETLSyntheaBuilder::LoadEventTables(connectionDetails = cd, cdmSchema = cdmSchema, syntheaSchema = syntheaSchema, cdmVersion = cdmVersion, syntheaVersion = syntheaVersion)

The first steps of the ETL are very fast but the event tables take a long time. I looked at the insert_condition_occurrence.sql and it joins over the tables source_to_source_vocab_map and source_to_standard_vocab_map. Both of them are very large.

Do I need additional indexes? How is the load performance for others?

burrowse commented 11 months ago

Closing. Related to #158. We are exploring adding additional index support and a resolution is pending for this issue.