OHDSI / OMOPV4_PCORNetV1_ETL

ETL script to transform data from OMOP v4 CDM to PCORNet V1 CDM
8 stars 7 forks source link

discrepancy in #records omop.condition_occurrence-> pcornet.diagnosis #5

Closed writetoritu closed 9 years ago

writetoritu commented 9 years ago

The query seems to be working correctly: When transforming the omop.condition_occurrence table to the pcornet.diagnosis table, we are able to capture all the visits, i.e. the #visits in both tables is the same.

However, there is a discrepancy in #records : #records (omop.condition_occurrence) > #records (pcornet.diagnosis ) because of the following reasons:

  1. In OMOP, some conditions are not associated with any visit, i.e. some records in the OMOP.condition_occurrence table have NULL value in the visit_occurrence_id field. In PCORnet, ENCOUNTERID is a mandatory field and hence all conditions are required to be associated with a visit. After the transformation from OMOP->PCORnet, we lose all the condition_occurrence records that are not associated with any visit.
  2. There is a difference in the granularity between the two tables (An OMOP condition occurrence has much finer level of granularity than a PCORnet diagnosis)
    • An OMOP condition = < condition_concept_id, condition_start_date, visit_occurrence_id, associated_provider_id, condition_source_value, condition_type_concept_id>
      • Please note that I included the three fields (associated_provider_id, condition_source_value, condition_type_concept_id) in the definition. At CHOP, we observed that a given < condition_concept_id, condition_start_date, visit_occurrence_id> may be associated with
      • multiple associated_provider_ids
      • multiple condition_source_values
      • multiple condition_type_concept_ids
    • A PCORnet diagnosis = < ENCOUNTERID, DX>
      • In the pcornet.diagnosis table, there are no fields corresponding to associated_provider_id, condition_source_value, condition_type_concept_id, or even condition_start_date.
    • As a result, all records in omop that correspond to a given condition linked to a certain visit get merged into a single diagnosis record in pcornet. Hence, we observed a decrease in the #records in this transformation.
gracebrownecodes commented 9 years ago

On the reduction in condition records.

  1. We may be able to associate some more conditions with visits. Failing that, we can consider associating condition records with the closest visit in time during the transformation to PCORnet (I wouldn't want to insert assumed data in the OMOP model if its not required).
  2. I don't think there is anything that can be done about the provider and condition type dimensions of condition that OMOP records but PCORnet does not.
  3. We may be able to improve the mapping of condition source values to condition concepts and thereby reduce the number of duplicates when the source value is removed for PCORnet. For example, several of the records Ritu showed me had a condition concept of 0, meaning unknown. So this issue may resolve itself as we move to better concept mappings in Vocabulary V4.5

Any thoughts @burrowse?

gracebrownecodes commented 9 years ago

@mgkahn is against 1:

"I do not think we should be making data up so I do not think we should do 1 for the same reason in your parenthetical."

burrowse commented 9 years ago

Agreed @mgkahn @aaron0browne ... I think we want to put the best representation of the data what we have forth and not interpolate any values or make our best guess in this case. The conditions that are not associated with visits are problem list conditions. I think generally speaking a diagnosis from the problem list should be noted on an encounter at least once, so we shouldn't be losing too much condition occurrence data by not being able to map them because of this. I can look into this and provide more specific numbers.

Agreed on the differences in the models, I think that the required fields are indicative of the granularity of information each model is hoping to address when querying for information. Unless PCORnet expands these values are just going to be dropped, but the presence of the condition on the chart remains intact.

writetoritu commented 9 years ago

Such reasons for discrepancy will be documented in a separate code review document. Closing this issue.