OHDSI / ETL-CMS

Workproducts to ETL CMS datasets into OMOP Common Data Model
Apache License 2.0
94 stars 52 forks source link

defect 33, add mappings for concept_id #37

Closed opme closed 2 years ago

opme commented 7 years ago

I made an attempt at fixing the concept_id issue and created this pull request for discussion.

I took into account that the primary diagnosis code and then list of 10 diagnosis could be either condition, procedure or occurrence. I have only done mappings for condition and procedure to codes in the vocabulary defined in constants.py. I set HPCS codes to 0 since I am not sure how to handle these.

It is possible for procedures in the 10 diagnosis codes to get the same concept_id that comes from the ICD9_PRCDR_CD_1 – ICD9_PRCDR_CD_6.

Some mappings are not available and I indicated that in the constants.py where applicable.

opme commented 7 years ago

I did notice that the both records of the outpatient sample data have 2 HCPCS entries. These have been mapped to 0 in my code. Not sure what should be done with these.

ChristopheLambert commented 7 years ago

Thanks for taking a stab at this and opening your changes up for discussion.

I had a look at your code, and I'm concerned that you are using the fact that an ICD9 code was mapped to either a procedure or condition to say that it came from either a procedure or condition column. The purpose of this is provenance -- to say where the data came from. So if it comes from one of the ICD9_DGNS_CD columns, it may not necessarily map to a condition, but you should use the {OUTPAT,INPAT}_CONDITIONPOSITION{ X} designation, and if it came from one of the ICD9_PRCDR_CD columns it may not necessarily map to a procedure, but the provenance should show {OUTPAT,INPAT}+_PROCEDUREPOSITION{X}.

Also all or nearly all of the 45 of the outpatient HCPCS columns are populated in the source data, and I think we should therefore use the outpatient detail entries (1st through 45th), instead of zero. Inpatient HCPCS colums are there in the source data, but are all empty (I checked), so it will be a non-issue -- though the code should probably verify the HCPCS inpatient data is empty for robustness.

Thanks, Christophe

ChristopheLambert commented 7 years ago

Ironically, I had made similar fixes myself, but got stuck on the problem of what code to use for the admitting diagnosis. I thought I read somewhere there was a push to name everything either primary or secondary -- who cares of something came from the 2nd or 9th column -- we just care it is not primary. From the point of view of the OMOP CDM being used for analysis, it seems this level of detail will get in the way. What are your thoughts on this?

opme commented 7 years ago

Yes, it makes sense to me to do it this way. It makes it simpler also.

ChristopheLambert commented 7 years ago

Your original use case seemed to be calling for the detailed provenance -- was only primary and secondary necessary for you? Do you want to revise the code? I don't think we want to commit it as is, per my earlier comments. Regardless of how we go, there is at minimum the need to fix the bug that the inpatient admitting diagnoses are skipped.

opme commented 7 years ago

No, I don't want this committed as is. I think just fixing the bug related to inpatient admitting diagnoses + making primary admitting diagnosis have the correct type_concept_id so it is possible to tell it apart from the other ones

ChristopheLambert commented 2 years ago

Closing, as it was agreed to not commit as is, per discussions above.