OHDSI / ETL-CMS

Workproducts to ETL CMS datasets into OMOP Common Data Model
Apache License 2.0
94 stars 52 forks source link

mapping for CONDITION_TYPE_CONCEPT_ID not complete #33

Open opme opened 7 years ago

opme commented 7 years ago

A CMS Inpatient_Claims data record has a ADMTNG_ICD9_DGNS_CD field and also ICD9_DGNS_CD_1 – ICD9_DGNS_CD_10, ICD9_PRCDR_CD_1 – ICD9_PRCDR_CD_6.

A concept in the vocabulary exists for Primary Diagnosic: 38000199òInpatient header - primaryò1òCondition Occurrence Type but it is not used in the ETL. All inpatient records are hardcoded to a CONDITION_TYPE_CONCEPT_ID of 38000200 which is Inpatient header - 1st position and similar 38000230 for outpatient. This happens also in the procedure_occurrence OMOP table.

I believe it means that data is lost in the translation since there are 17 possible codes coming from the Medicare data but the data is mapped into only 4.

In my case, I am trying to replicate a study called Surgeon Scorecard against an OMOP data format that used the Medicare CMS ADMTNG_ICD9_DGNS_CD field as a way to find particular surgeries and calculate a Surgeon complication rate.

Here are the counts of the concept_id's in the converted SynPuf data.

CONDITION_TYPE_CONCEPT_ID,COUNT,DESCRIPTION 38000230,280864910,Outpatient header - 1st position 38000200,8317475,Inpatient header - 1st position

PROCEDURE_TYPE_CONCEPT_ID,COUNT,DESCRIPTION 38000269,275176949,Outpatient header - 1st position 38000251,3592580,Inpatient header - 1st position

ChristopheLambert commented 7 years ago

Indeed, all of the positions are hardcoded to the 1st one. I seem to remember a justification at the time for not doing it correctly as you state, but it escapes me now.

In addition, the HCPCS codes in positions HCPCS_CD_1 through HCPCS_CD_45 of the inpatient and and outpatient source data also need to have mappings to 38000183 (inpatient detail 1st position) and 38000267 (outpatient detail 1st position). One weird issue is that the source data has 45 columns of inpatient detail, but there are only codes in the vocabulary for inpatient detail that span the 1st-20th position. In contrast, there are concept codes for outpatient detail that span the 1st-45th positions.

Just looking at the source data now for DE_1, it seems there are no HCPCS codes in the inpatient data for the 45 columns, so if that holds for the rest of the data, perhaps this is a non-issue, and the fix is straightforward.

Are you running the ETL from scratch and able to test a fix to see if it addresses all your concerns before we go through the labor of uploading all the ETL'd data again?

ChristopheLambert commented 7 years ago

I also noticed that the ETL is skipping inpatient ADMTG_DGNS_CD fields, but not skipping outpatient ADMTG_DGNS_CD fields -- the latter assigned to a 1st position concept type. I've checked, and the ADMTG_DGNS_CD will be an ICD9 code, but it could be designated as a condition, a procedure, or possibly even an observation. Thus we need an appropriate code for: condition_type_concept_id, procedure_type_concept_id, and perhaps observation_type_concept_id.

For inpatient we have:

38000183    Inpatient detail - primary  Condition
38000199    Inpatient header - primary  Condition
38000248    Inpatient detail - primary position Procedure
38000250    Inpatient header - primary position Procedure

But for outpatient we only have these choices -- procedures but not conditions:

38000266    Outpatient detail - primary position    Procedure   
38000268    Outpatient header - primary position    Procedure

What do you recommend? I could just put 0, but don't know if that helps with your use case.

opme commented 7 years ago

Are you running the ETL from scratch and able to test a fix to see if it addresses all your concerns before we go through the labor of uploading all the ETL'd data again?

I wasn't but I can set that up today. I have already downloaded the synpuf data to my server. I would be willing to test any fix you provide.

What do you recommend? I could just put 0, but don't know if that helps with your use case.

I don't understand the difference between "Inpatient Detail" and "Inpatient Header". Either of those would be fine. Maybe you could put 0 for the ones that don't currently have a vocab entry until those can be created.

Here is a possible mapping: ADMTNG_ICD9_DGNS_CD field on inpatient claims-> 38000199 Inpatient header - primary for condition and 38000250 Inpatient header - primary position for procedure. Similar could be done for outpatient.

I am checking the code at CMS_SynPuf_ETL_CDM_v5.py function process_inpatient_records to try to figure out what is really possible.

One weird issue is that the source data has 45 columns of inpatient detail, but there are only codes in the vocabulary for inpatient detail that span the 1st-20th position

I checked and count not find any usage of the HCPCS codes either in the SynPuf files. The ICD9_DGNS_CD_1 – ICD9_DGNS_CD_10, ICD9_PRCDR_CD_1 – ICD9_PRCDR_CD_6 are used.

For my particular use case though I only need the Primary, so if the complete mapping is a big job then inpatient primary is good enough and an important information to maintain in the ETL.

opme commented 7 years ago

Pull request has been submitted for discussion