OHDSI / ETL-CMS

Workproducts to ETL CMS datasets into OMOP Common Data Model
Apache License 2.0
94 stars 52 forks source link

Synpuf Load for SQL Server #29

Closed cukarthik closed 6 years ago

cukarthik commented 7 years ago

Here's a script to load the synpuf data into SQL Server if you'd like to add it to the project.

TRUNCATE TABLE [care_site]; BULK INSERT [care_site] FROM 'C:\synpuf_1\care_site_1.csv' WITH ( FIRSTROW = 2, FIELDTERMINATOR = ',', ROWTERMINATOR = '0x0a', ERRORFILE = 'C:\synpuf_1\care_site_1.bad', TABLOCK );

TRUNCATE TABLE [condition_occurrence]; BULK INSERT [condition_occurrence] FROM 'C:\synpuf_1\condition_occurrence_1.csv' WITH ( FIRSTROW = 2, FIELDTERMINATOR = ',', ROWTERMINATOR = '0x0a', ERRORFILE = 'C:\synpuf_1\condition_occurrence_1.bad', TABLOCK );

TRUNCATE TABLE [death]; BULK INSERT [death] FROM 'C:\synpuf_1\death_1.csv' WITH ( FIRSTROW = 2, FIELDTERMINATOR = ',', ROWTERMINATOR = '0x0a', ERRORFILE = 'C:\synpuf_1\death_1.bad', TABLOCK );

TRUNCATE TABLE [device_exposure]; BULK INSERT [device_exposure] FROM 'C:\synpuf_1\device_exposure_1.csv' WITH ( FIRSTROW = 2, FIELDTERMINATOR = ',', ROWTERMINATOR = '0x0a', ERRORFILE = 'C:\synpuf_1\device_exposure_1.bad', TABLOCK );

TRUNCATE TABLE [drug_exposure]; BULK INSERT [drug_exposure] FROM 'C:\synpuf_1\drug_exposure_1.csv' WITH ( FIRSTROW = 2, FIELDTERMINATOR = ',', ROWTERMINATOR = '0x0a', ERRORFILE = 'C:\synpuf_1\drug_exposure_1.bad', TABLOCK );

TRUNCATE TABLE [location]; BULK INSERT [location] FROM 'C:\synpuf_1\location_1.csv' WITH ( FIRSTROW = 2, FIELDTERMINATOR = ',', ROWTERMINATOR = '0x0a', ERRORFILE = 'C:\synpuf_1\location_1.bad', TABLOCK );

TRUNCATE TABLE [measurement]; BULK INSERT [measurement] FROM 'C:\synpuf_1\measurement_occurrence_1.csv' WITH ( FIRSTROW = 2, FIELDTERMINATOR = ',', ROWTERMINATOR = '0x0a', ERRORFILE = 'C:\synpuf_1\measurement_occurrence_1.bad', TABLOCK );

TRUNCATE TABLE [observation]; BULK INSERT [observation] FROM 'C:\synpuf_1\observation_1.csv' WITH ( FIRSTROW = 2, FIELDTERMINATOR = ',', ROWTERMINATOR = '0x0a', ERRORFILE = 'C:\synpuf_1\observation_1.bad', TABLOCK );

TRUNCATE TABLE [observation_period]; BULK INSERT [observation_period] FROM 'C:\synpuf_1\observation_period_1.csv' WITH ( FIRSTROW = 2, FIELDTERMINATOR = ',', ROWTERMINATOR = '0x0a', ERRORFILE = 'C:\synpuf_1\observation_period_1.bad', TABLOCK );

TRUNCATE TABLE [payer_plan_period]; BULK INSERT [payer_plan_period] FROM 'C:\synpuf_1\payer_plan_period_1.csv' WITH ( FIRSTROW = 2, FIELDTERMINATOR = ',', ROWTERMINATOR = '0x0a', ERRORFILE = 'C:\synpuf_1\payer_plan_period_1.bad', TABLOCK );

TRUNCATE TABLE [person]; BULK INSERT [person] FROM 'C:\synpuf_1\person_1.csv' WITH ( FIRSTROW = 2, FIELDTERMINATOR = ',', ROWTERMINATOR = '0x0a', ERRORFILE = 'C:\synpuf_1\person_1.bad', TABLOCK );

TRUNCATE TABLE [procedure_occurrence]; BULK INSERT [procedure_occurrence] FROM 'C:\synpuf_1\procedure_occurrence_1.csv' WITH ( FIRSTROW = 2, FIELDTERMINATOR = ',', ROWTERMINATOR = '0x0a', ERRORFILE = 'C:\synpuf_1\procedure_occurrence_1.bad', TABLOCK );

TRUNCATE TABLE [provider]; BULK INSERT [provider] FROM 'C:\synpuf_1\provider_1.csv' WITH ( FIRSTROW = 2, FIELDTERMINATOR = ',', ROWTERMINATOR = '0x0a', ERRORFILE = 'C:\synpuf_1\provider_1.bad', TABLOCK );

TRUNCATE TABLE [specimen]; BULK INSERT [specimen] FROM 'C:\synpuf_1\specimen_1.csv' WITH ( FIRSTROW = 2, FIELDTERMINATOR = ',', ROWTERMINATOR = '0x0a', ERRORFILE = 'C:\synpuf_1\specimen_1.bad', TABLOCK );

TRUNCATE TABLE [visit_occurrence]; BULK INSERT [visit_occurrence] FROM 'C:\synpuf_1\visit_occurrence_1.csv' WITH ( FIRSTROW = 2, FIELDTERMINATOR = ',', ROWTERMINATOR = '0x0a', ERRORFILE = 'C:\synpuf_1\visit_occurrence_1.bad', TABLOCK );

ChristopheLambert commented 7 years ago

Thanks for providing this. Right now the code and documentation we provide is postgreSQL-centric. We did't have SQL Server or Oracle to test those systems.

In order to properly support SQL Server and perhaps Oracle, we would need revised documentation that breaks out the documentation and SQL code into database-specific tracks. Please feel free to do this and initiate a pull request.

Note, there are other repositories that we swiped from to include the SQL code we did, making small modifications to make everything work with OMOP V5.0.0. In the long run it would probably be better to not duplicate across repositories the code to create the database, load the data and vocabulary files, create indices and create constraints. We compromised on this to have most things you needed to get up and running with the synpuf data, but it is probably going to bite us in the long run.

I was dismayed to find that the change from 5.0.0 to 5.0.1 of the OMOP CDM not only changed some fields but deleted and added new tables -- to me that is not a bug-fix release, as it breaks compatibility. In any event, going forward we need a way to maintain compatibility with not only OMOP CDM versions but also database systems (and versions). For instance, unlike with version 9.2, which we used, PostgreSQL 9.5 provides support for bulk-loading gzipped .csv files, removing the inconvenience of having to uncompress these huge files.

cukarthik commented 7 years ago

Thanks @ChristopheLambert. That makes sense. I figured I'd share whatever I've done so far since we are using SQL Server. Btw, do you have the ACHILLES JSON files for this data set?

ChristopheLambert commented 7 years ago

Hi @cukarthik

We do have the ACHILLES JSON files -- are they something helpful to post, or would they be site configuration specific?

cukarthik commented 7 years ago

Hi @ChristopheLambert , It would be helpful for us. We are setting up the synpuf 1 percent database for some students. If you have the JSON files, could we download them somewhere? Thanks.

ChristopheLambert commented 7 years ago

We are re-creating to ensure it matches that of the uploaded data. Are you wanting the JSON files for the full dataset or just the first 1/20th subset or both?

cukarthik commented 7 years ago

Just the first 1/20th. Eventually we might want the full set.

Thanks

Sent from my phone

On Aug 27, 2016, at 12:44 AM, Christophe Lambert notifications@github.com<mailto:notifications@github.com> wrote:

We are re-creating to ensure it matches that of the uploaded data. Are you wanting the JSON files for the full dataset or just the first 1/20th subset or both?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/OHDSI/ETL-CMS/issues/29#issuecomment-242869546, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMakuU2wW3qjRwEGfCxM-c5f1uKzXGr5ks5qj2xlgaJpZM4JsWs_.

ChristopheLambert commented 7 years ago

I uploaded a zip file containing the JSON files for AchillesWeb for the first 1/20th of the data here: ftp://ftp.ohdsi.org/synpuf/AchillesExport_1.zip

Please let me know if these work for you.

rsurdikar commented 6 years ago

Hi @cukarthik, It would be helpful if you can share "create_CDMv5_drug_era_non_stockpile.sql" and "create_CDMv5_condition_era.sql" sql scripts for SQL server. Thanks.

cukarthik commented 6 years ago

Hi @rsurdikar, Unfortunately, we do not have those scripts since we did not include them in our database.

rsurdikar commented 6 years ago

Thanks for the response.