OHDSI / ETL-CMS

Workproducts to ETL CMS datasets into OMOP Common Data Model
Apache License 2.0
94 stars 52 forks source link

drug_era, observation_period, cost column mismatch #56

Closed leenadR closed 5 years ago

leenadR commented 5 years ago

Many thanks for providing the 100k download vie Google drive. However, I experienced some problems when importing the data.

drug_era

Importing drug_era fails: ERROR: invalid input syntax for integer: "2008-02-27"

Looking at the data, the columns don't match with the table definition in CDM (I checked also versions before and after 5.2): The third column should be the concept_id, not a date.

gunzip -c drug_era.gz | head 4107,528323,2008-02-27,2008-02-27,1,\000,215049 2060,722031,2008-03-11,2008-03-21,1,\000,105476 4938,710062,2008-02-14,2008-03-15,1,\000,258700

CREATE TABLE drug_era ( drug_era_id INTEGER NOT NULL , person_id INTEGER NOT NULL , drug_concept_id INTEGER NOT NULL , drug_era_start_date DATE NOT NULL , drug_era_end_date DATE NOT NULL , drug_exposure_count INTEGER NULL , gap_days INTEGER NULL ) ;

observation_period

Similarly, the data for observation_period still has the observation_period_start_datetime and observation_period_stop_datetime columns, which were deprecated before 5.2.

cost

Last, the cost data does not fit the column definition by the CDM. The import fails, because "Procedure" is not an INTEGER.

gunzip -c cost.gz | head 189455742,Procedure,0,44818668,\000,\000,10,10,0,\000,0,0,0,\000,\000,\000,\000,\000,\000,0,\000,543966571 189455742,Procedure,0,44818668,\000,\000,20,20,0,\000,0,0,0,\000,\000,\000,\000,\000,\000,0,\000,543966572 189455743,Procedure,0,44818668,\000,\000,10,10,0,\000,0,0,0,\000,\000,\000,\000,\000,\000,0,\000,543966573

CREATE TABLE cost ( cost_id INTEGER NOT NULL , cost_event_id INTEGER NOT NULL , cost_domain_id VARCHAR(20) NOT NULL , cost_type_concept_id INTEGER NOT NULL , currency_concept_id INTEGER NULL , total_charge NUMERIC NULL , total_cost NUMERIC NULL , total_paid NUMERIC NULL , paid_by_payer NUMERIC NULL , paid_by_patient NUMERIC NULL , paid_patient_copay NUMERIC NULL , paid_patient_coinsurance NUMERIC NULL , paid_patient_deductible NUMERIC NULL , paid_by_primary NUMERIC NULL , paid_ingredient_cost NUMERIC NULL , paid_dispensing_fee NUMERIC NULL , payer_plan_period_id INTEGER NULL , amount_allowed NUMERIC NULL , revenue_code_concept_id INTEGER NULL , reveue_code_source_value VARCHAR(50) NULL , drg_concept_id INTEGER NULL, drg_source_value VARCHAR(3) NULL ) ;

gowthamrao commented 5 years ago

Is cost in the Dev branch format? This CDM definition is approved, but waiting 2018 ohdsi symposium for release as v6.0

https://github.com/OHDSI/CommonDataModel/tree/Dev/COST

leenadR commented 5 years ago

No, I only use the release versions (tagged 5.2, 5.0, ...)

See line 598: https://github.com/OHDSI/CommonDataModel/blob/v5.2.0/PostgreSQL/OMOP%20CDM%20ddl%20-%20PostgreSQL.sql

Even in the dev-branch, the columns would not fit ... CREATE TABLE cost ( cost_id INTEGER NOT NULL , person_id INTEGER NOT NULL, cost_event_id INTEGER NOT NULL , cost_domain_id VARCHAR(20) NOT NULL ,

ericaVoss commented 5 years ago

@leenadR where did you download this? Just want to make sure I know which copy you are working with. I'm assuming it is the new one.

I think we have seen this problem before, the COST_ID is the last column in this export of SYNPUF. :(

image

Tagging @leeevans

leeevans commented 5 years ago

@leenadR this is an issue with the file format of the Synpuf 100k export file. It's been corrected in a replacement Synpuf 5% sample file that is now available for download.

You can use the 'CMS SynPUF 5% Sample' link at the bottom of the below web page to download the new file:

https://www.ohdsi.org/data-standardization/

leenadR commented 5 years ago

Many thanks, the 5% sample works just fine. => issue can be closed