PEDSnet / pedsnetcdm_to_pcornetcdm

SQL code to transform data from PEDSnet CDM to PCORnet CDM.
6 stars 4 forks source link

ETL for Diagnosis.DX_SOURCE #15

Closed writetoritu closed 8 years ago

writetoritu commented 8 years ago

Some notes on this field:

The OMOP-PCORnet subcommittee has adopted the following convention to populate this field:

The absence of an Observation and FACT_RELATIONSHIP table for the respective CONDITION_OCCURENCE record will be interpreted as “field is not available at the source”.

@mgkahn @baileych We were wondering whether we should have a similar convention in PEDSnet or just populate this as NI?

baileych commented 8 years ago

I'm happy to discuss implications further, but for cases like this where our requirement is essentially to add a new value to every row in condition_occurrence, I've come to favor just adding an extra column to that table, on the theory that fact_relationship is going to get really big if we follow the OMOP-canonical approach of stashing these qualifiers in observation and linking to them.

For this specific case, though, I wonder whether we get what we need straight from condition_type_concept_id, or could do so with small extensions to that attribute?

toanong commented 8 years ago

I agree that we should avoid using fact_relationship whenever we can. The subgroup always opted to not alter the OMOP table but we don't have to follow that principle. It's really fortunate if we already have what is needed to populate dx_source in condition_type_concept_id. If not, the extension @baileych was referring to would require a change in the PEDSnet CDM convention.

baileych commented 8 years ago

@toanong I think we've probably got what we need in the short term in condition_type_concept_id, since we've by convention agreed that we're only capturing final diagnoses. So I think - @toanong @mgkahn @writetoritu check me on this - that we can map inpatient discharge diagnoses to DI, outpatient diagnoses to FI, problem list entries to another table, and any other condition_type_concept_id to OT.

I understand the subgroup's concern about integrity of the OMOP CDM, but I think as long as we stick to non-breaking extensions we have a lot of flexibility.

mgkahn commented 8 years ago

Can somebody explain why the OMOP-PEDSnet group felt that using the condition_type_concept_id, as Charlie articulates above, was not acceptable, resulting in the highly inefficient use of two additional rows (one in observation and one in fact_relationship) for each diagnosis? If a field exists that has the semantics we want, we should use it. If that field has the right semantics but does not have all of the terminology terms we need, then we should advocate for the missing terms. In this case, I do not see why condition_type_concept_id isn't sufficient to meet the need without any change in the model and without resorting to using fact_relationships.

Long way of saying that I strongly agree with Charlie's condition_type_concept_id mapping. And if we feel we need an additional term/value for this field, we should ask for that to be added.

Charlie -- I didn't understand your statement: "problem list entries to another table". I'm assuming you mean to another table in PCORnet CDM. If so, why are you not advocating putting into PCORnet's Diagnosis table with DX_SOURCE = OT?

writetoritu commented 8 years ago

I agree with all of you on simply using condition_type_concept_id to populate PCORnet's Diagnosis.DX_SOURCE for the PEDSnet->PCORnet transformation.

mgkahn commented 8 years ago

Got it. Let's "institutionalize" the use of only "EHR problem list", "inpatient header (primary position)", "inpatient header (2nd position)", "outpatient header (1st position)", "outpatient header (2nd position)" as the only valid PEDSnet values from the 98 valid values in the OMOP terminology. Do you see any reason why we need primary condition, secondary condition that cannot be fulfilled by the other values. I can't look at the conventions document while writing this comment. If there is a separate semantics that make sense, then include these two also but make it clear when we expect to use primary/secondary condition over the other values.

Regarding the suggested mapping -- I say "go with what you wrote until we have a reason to revisit".

mgkahn commented 8 years ago

I also appreciate that our decision to deviate from the larger group's approach means we will also be on our own for our PEDSnet --> PCORnet mapping code. Sorry to see that happen but I think we cannot afford to have such a large blow-up on the size of our database just for this reason.

writetoritu commented 8 years ago

Thanks all - we will wait for a decision in https://github.com/PEDSnet/Data_Models/issues/157 and then update the mapping table and the ETL code for PCORnet.diagnosis