callahantiff / OMOP2OBO

OMOP2OBO: A Python Library for mapping OMOP standardized clinical terminologies to Open Biomedical Ontologies
http://tiffanycallahan.com/OMOP2OBO_Dashboard
MIT License
84 stars 12 forks source link

HELP: Error Analysis #47

Closed callahantiff closed 4 years ago

callahantiff commented 4 years ago

@mgkahn - This issue is meant to be used for use to discuss the error analysis that we spoke about today. As a reminder, today I was tasked with figuring out which to the relationship ids we discussed were worth including and how to best categorize them. Details below:

SQL Query

Here is the query that I ended up running:

SELECT
  DISTINCT r.relationship_id,
  c1.concept_id AS SOURCE_CONCEPT_ID,
  c1.concept_name AS SOURCE_CONCEPT_LABEL,
  c2.concept_id AS TARGET_CONCEPT_ID,
  c2.concept_name AS TARGET_CONCEPT_LABEL,
FROM
  sandbox-omop.oct_2020.concept_relationship r
  JOIN sandbox-omop.oct_2020.concept c1 ON c1.concept_id = r.concept_id_1
  JOIN sandbox-omop.oct_2020.concept c2 ON c2.concept_id = r.concept_id_2
WHERE
  r.concept_id_1 IN (SELECT concept_id FROM sandbox-tc.CHCO_DeID_Oct2018.OMOP2OBO_Conditions_Concepts_Merged
                      UNION DISTINCT
                     SELECT ingredient_concept_id FROM sandbox-tc.CHCO_DeID_Oct2018.OMOP2OBO_Medications_Concepts_Merged
                      UNION DISTINCT
                     SELECT concept_id FROM sandbox-tc.CHCO_DeID_Oct2018.OMOP2OBO_Measurements_Concepts_Merged)
  AND r.relationship_id IN ("Concept replaced by", "Maps to", "Concept same_as from", "Concept poss_eq from", "Concept was_a from", "Is a")
  AND (r.valid_start_date > '2018-06-26' AND r.valid_start_date < '2020-10-17')
ORDER BY r.relationship_id;

Relationship IDs

The relationship types that I think we should use are shown grouped by two categories below:

Newly Added Concepts:

Replaced Concepts:

Among the Newly Added Concepts, everything other than Maps to is meant to help provide a mechanism for helping to explain the missed concepts.

callahantiff commented 4 years ago

Think we are good on this, closing for now. Relevant content from this issue has been moved to the Consistency - Error Analysis Experiments Wiki page.

callahantiff commented 4 years ago

Reopening this. @mgkahn, I wanted to verify how I plan to approach using this data with you after spending some time exploring it, with respect to OMOP2OBO missing concepts this weekend.

In general, as described above, we can use these data to help determine if a concept is missing because it was added to the OMOP CDM after we pulled our concepts from CHCO (i.e. Newly Added Concept) or if it replaced an existing concept that is in our mapping set (i.e. Replaced Concept).

With this in mind, there are 367 condition concepts in the Concept Prevalence data that are not in OMOP2OBO that we can account for using the data from the query shown above. This results in some really interesting outcomes that I want to make sure that I am handling in a way that we both agree with. See example below.

Concept: 36675019

RELATIONSHIP_ID SCENARIO_TYPE SOURCE_CONCEPT_ID SOURCE_CONCEPT_LABEL TARGET_CONCEPT_ID TARGET_CONCEPT_LABEL
Concept replaced by Replaced Concept 4251460 Late radiation dermatitis 36675019 Dermatitis as late effect of radiation
Maps to Newly Added Concept 4251460 Late radiation dermatitis 36675019 Dermatitis as late effect of radiation
Is a Newly Added Concept 37110582 Acute radiodermatitis due to and following rad... 36675019 Dermatitis as late effect of radiation
Is a Newly Added Concept 37110583 Chronic radiodermatitis due to and following r... 36675019 Dermatitis as late effect of radiation

From these results we can see that there are 3 source concepts that have a connection to this concept:


QUESTIONS
For this scenario, we have three possible source ids that do currently exist in the OMOP2OBO mapping set that we can map this missing concept to. This brings up a few questions I am hoping to get your feedback on:

  1. For scenarios where a missing TARGET_CONCEPT_ID is linked to a single SOURCE_CONCEPT_ID by multiple relationships, which one takes precedent? Or how should I categorize this?
  2. In general, when there are multiple SOURCE_CONCEPT_IDs that map to a single TARGET_CONCEPT_ID, I was planning on tracking this information so we can categorize how these mappings occurred in our results. Do you agree with this? It also present an interesting dilemma of how we would then recover this missing concept; which of the source code's ontology mappings would we want to transfer to this code? Perhaps the most specific (i.e. not using the annotations from the SOURCE_CONCEPT_IDs connected via an Is a relation).
callahantiff commented 4 years ago

Looking into these a bit more, I think it's not useful to try and differentiate between a concept being a Replaced Concept or a Newly Added Concept. They both seem to imply that a concept has been updated. Unless I hear otherwise, I am going to treat them this way.

mgkahn commented 4 years ago

Agree. Impact is the same – not present earlier and present now…..

From: "Tiffany J. Callahan" notifications@github.com Reply-To: callahantiff/OMOP2OBO reply@reply.github.com Date: Monday, November 9, 2020 at 1:33 PM To: callahantiff/OMOP2OBO OMOP2OBO@noreply.github.com Cc: "Kahn, Michael" MICHAEL.KAHN@CUANSCHUTZ.EDU, Mention mention@noreply.github.com Subject: Re: [callahantiff/OMOP2OBO] HELP: Error Analysis (#47)

Looking into these a bit more, I think it's not useful to try and differentiate between a concept being a Replaced Concept or a Newly Added Concept. They both seem to imply that a concept has been updated. Unless I hear otherwise, I am going to treat them this way.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/callahantiff/OMOP2OBO/issues/47#issuecomment-724261705, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA557TUDGRLTG7S3DSWHQ4TSPBGW7ANCNFSM4TEOTNWQ.

callahantiff commented 4 years ago

Agree. Impact is the same – not present earlier and present now….. From: "Tiffany J. Callahan" notifications@github.com Reply-To: callahantiff/OMOP2OBO reply@reply.github.com Date: Monday, November 9, 2020 at 1:33 PM To: callahantiff/OMOP2OBO OMOP2OBO@noreply.github.com Cc: "Kahn, Michael" MICHAEL.KAHN@CUANSCHUTZ.EDU, Mention mention@noreply.github.com Subject: Re: [callahantiff/OMOP2OBO] HELP: Error Analysis (#47) Looking into these a bit more, I think it's not useful to try and differentiate between a concept being a Replaced Concept or a Newly Added Concept. They both seem to imply that a concept has been updated. Unless I hear otherwise, I am going to treat them this way. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#47 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA557TUDGRLTG7S3DSWHQ4TSPBGW7ANCNFSM4TEOTNWQ.

Great! Thank you for confirming! Validation Notebook results coming your way soon 😄