Closed scivm closed 3 years ago
@scivm Hi, thanks for bringing this to our attention.
We've let ETL-Synthea languish for a bit in 2020, but we're aiming to keep the number of issues at zero in the future. :)
This PR includes (amongst other things) a new function that will pull the latest index and constraint ddl scripts from OHDSI CDM.
These scripts (located in the "output" directory) can be run in a separate SQL session or by simply passing them to DatabaseConnector::executeSql().
NOTE: Using main latest I always get 0 records in my drug_era table and it finishes in less then a second. I had to use the pull request https://github.com/OHDSI/ETL-Synthea/pull/48 to get the drug_era conversion table populated. I basically took that branch and remove the OMOP 6.0 specific code and ended up with a working migration after I also took pull request https://github.com/OHDSI/ETL-Synthea/pull/53 to get the OMOP 5.3.1 death table. I then added some code to migrate the location data in synthea patient table to OMOP location table. Code is at https://github.com/scivm/ETL-Synthea.
Doing testing with this patched setup, I found that with 100K patients generated from synthea, the conversion in insert_drug_era.sql takes 1.85 days. 10k patients took less then a minute and 1k patients took about 15 seconds. I don't think throwing more computer power at it will help since postgres only uses a single core to run the migration and I have already tried to use 8GB ram?
Setup: Postgres 10 on windows 10 laptop with i7 2 cpu with total 8 core, 32GB ram and Ubuntu for windows.
Used 8 GB of ram for shared_buffers
Put indexes on vocabulary tables after loading them:
Analyzed all tables before starting