OHDSI / Vocabulary-v5.0

Build process for the OHDSI Standardized Vocabularies. Currently not available as independent release.
The Unlicense
222 stars 75 forks source link

Removal of LOINC Codes from Concept Ancestor Table #301

Closed noahgengel closed 4 years ago

noahgengel commented 4 years ago

I noticed that many of the LOINC codes previously used as ancestor_concept_ids in the concept_ancestor table were no longer in the concept_ancestor table with the latest vocabulary release. Is there any reason for this omission?

Some of the concepts I discovered were no longer in the concept_ancestor table were as follows:

40785793 | Monocytes 40779191 | Myelocytes 40795725 | Hemoglobin 40789527 | Anion gap 40782561 | Promyelocytes 40772748 | Erythrocyte mean corpuscular hemoglobin 40795800 | Cholesterol in LDL 40789120 | Erythrocytes 40795740 | Glucose 40785788 | Neutrophils 40779190 | Metamyelocytes 40791227 | Albumin 40775801 | Creatinine 40785861 | Aspartate aminotransferase 40779195 | Blasts 40782735 | Erythrocyte mean corpuscular hemoglobin concentration 40782562 | Sodium 40789190 | Alkaline phosphatase 40772531 | Lymphocytes 40782579 | Alanine aminotransferase 40795733 | Eosinophils 40789179 | Hematocrit 40779159 | Platelets 40779224 | Bilirubin 40789356 | Erythrocyte distribution width 40795754 | Globulin 40792440 | Bicarbonate 40789180 | Carbon dioxide 40772572 | Triglyceride 40795730 | Urea nitrogen 40782589 | Cholesterol in HDL 40785869 | Chloride 40785850 | Calcium 40789182 | Erythrocyte mean corpuscular volume 40789266 | Reticulocytes 40785796 | Basophils 40786033 | Platelet mean volume 40792413 | Potassium

Alexdavv commented 4 years ago

@noahgengel Right, LOINC parts are not Standard anymore. Instead, we introduced attribute relationships between parts and tests. Some other LOINC hierarchical concepts are Classifications and on the top of the tests.

cukarthik commented 4 years ago

@Alexdavv thanks for the clarification. Do you have a recommendation on how to use the concept relationship table to get all tests under a loinc hierarchy? I'm not too familiar with recursive queries but that seems to be the only way. Am I wrong?

noahgengel commented 4 years ago

@Alexdavv As a follow-up to what @cukarthik said: do you know if all of the information from the concept_ancestor table is now encoded in the concept_relationship table?

Alexdavv commented 4 years ago

Do you have a recommendation on how to use the concept relationship table to get all tests under a loinc hierarchy?

LOINC Hierarchy concepts are Classifications (standard_concept = 'C'), so all-level descendants and ancestors are shown in the concept_ancestor table. So no need for recursion.

If you'd like to use LOINS parts of 6 classes you can look into the concept_relationship table to follow from parts to the tests (and back). To do it, you can use the following relationships_id:

It looks like you don't need any hierarchical relationships here, don't you?

said: do you know if all of the information from the concept_ancestor table is now encoded in the concept_relationship table?

For sure, since concept_ancestor table is always built from the concept_relationship table. Or do you mean the information from the ex-concept_ancestor?

ChaoPang commented 4 years ago

@alexandr-bkv Hi Alex, thanks a lot for taking the time to answer our questions and I really appreciate that!

For sure, since concept_ancestor table is always built from the concept_relationship table. Or do you mean the information from the ex-concept_ancestor?

Yes, we actually meant the ex-concept_ancestor. Some of our analytics queries actually depend on the ancestor/descendant relationships provided by concept_anesctor to get all Lab Test concepts given a LONIC Hierarchy such as 40785793 | Monocytes (the concept_class_id has been changed from LONIC Hierarchy to LONIC Component for this concept in the vocabulary v5.0 30-APR-20). As @noahgengel mentioned in the issue, we noticed these do not exist in concept_ancestor anymore and as you later pointed out these relationships have been moved to concept_relationship, thanks for the clarification on that!

LOINC Hierarchy concepts are Classifications (standard_concept = 'C'), so all-level descendants and ancestors are shown in the concept_ancestor table. So no need for recursion.

It seems that we still need to recursively query concept_relationship in order to get all Lab Test concepts in the current vocabulary such as 40785793 | Monocytes. In the vocabulary v5.0 22-APR-19, the below query gives me 124 lab tests

SELECT 
    * 
FROM concept_ancestor AS ca
JOIN concept AS c
  ON ca.descendant_concept_id = c.concept_id 
WHERE ancestor_concept_id = 40785793 AND ca.ancestor_concept_id != ca.descendant_concept_id 
  AND c.concept_class_id = 'Lab Test'

In the newer version v5.0 30-APR-20, we only get 17 Lab Test concepts for 40785793 using concept_relationship without recursion,

SELECT 
    * 
FROM concept_relationship AS cr
JOIN concept AS c
    ON cr.concept_id_2 = c.concept_id
WHERE concept_id_1 = 40785793
-- 17 Lab Test concepts associated with 'Component of'
-- 18 LOINC Hierarchy and LOINC Component concepts are associated with 'Subsumes'

To get all 124 Lab Tests, we would have to write a recursive CTE to retrieve all of them from concept_relationship, (e.g. 40795715|Macrophages (LOINC Component) is subsumed by 40785793 | Monocytes, and is associated with a list of Lab Test concepts such as 43055293|Macrophages/Leukocytes [Pure number fraction] in Pericardial fluid by Manual count)

WITH base_concepts AS
(
    SELECT 
        *
    FROM concept AS c
    WHERE c.concept_id = 40785793
),

recursive_find_descendants AS
(
    SELECT --the base of the CTE
        cr.concept_id_1 AS ancestor_concept_id,
        c2.concept_id AS descendant_concept_id,
        1 AS levels_of_separation
    FROM base_concepts AS con
    JOIN concept_relationship AS cr
    ON cr.concept_id_1 = con.concept_id 
        AND relationship_id IN ('Subsumes', 'Component of')
    JOIN dbo.concept AS c2 
        ON cr.concept_id_2 = c2.concept_id

    UNION ALL

    SELECT --the recursive part of the CTE
        con.ancestor_concept_id,
        c2.concept_id AS descendant_concept_id,
        con.levels_of_separation + 1 AS levels_of_separation
    FROM recursive_find_descendants AS con
    JOIN concept AS c ON con.descendant_concept_id = c.concept_id
    JOIN concept_relationship AS cr
    ON cr.concept_id_1 = c.concept_id 
        AND cr.relationship_id IN ('Subsumes', 'Component of')
    JOIN dbo.concept AS c2 
        ON cr.concept_id_2 = c2.concept_id
)

-- Recursive CTE works in SQL Server, but not sure about the other SQL dialects. 
SELECT DISTINCT
    c.*
FROM recursive_find_descendants AS r
JOIN concept AS c
    ON r.descendant_concept_id = c.concept_id
WHERE c.concept_class_id = 'Lab Test'
  1. I am wondering if there is a better way of retrieving all Lab Tests for a given LONIC Hierarchy/Component?

  2. Right, LOINC parts are not Standard anymore. Instead, we introduced attribute relationships between parts and tests.

I am actually not sure what LONIC parts meant, did you refer to LONIC Component? If so, all of the concepts listed in the issue by @noahgengel seemed to have been changed from LONIC Hierarchy to LOINC Component, is that correct that all the "LONIC Component" concepts have been moved from concept_ancestor to concept_relationship?

cgreich commented 4 years ago

@ChaoPang:

It seems that we still need to recursively query concept_relationship

I think @Alexdavv meant to point out is that you need to use the CONCEPT_ANCESTOR table in order to not have to do recursive searching. That's the whole point of the table.

And instead of the LOINC parts use the LOINC Hierarchy. For example, instead of 40785793 "Monocytes" use 37040193 "Monocytes | Blood | Hematology and Cell counts" and following the ancestries to all the various monocyte measurements in the blood.

Funny thing is that 40785793 "Monocytes" has the hierarchical parent to 37040193 "Monocytes | Blood | Hematology and Cell counts". We should consider putting those LOINC parts back into the hierarchy.

ChaoPang commented 4 years ago

@cgreich Thanks for the explanation. We have been using CONCEPT_ANCESTOR in the past to retrieve all descendants in the past such as 40785793 "Monocytes".

It seems to me that there has been a major re-structure of the CONCEPT_ANCESTOR (just like @Alexdavv indicated in the previous comments), 40785793 "Monocytes" used to be an ancestor in CONCEPT_ANCESTOR for both LOINC Hierarchy and Lab Tests concepts in the previous version, which could give me all 124 Lab Tests via a simple lookup.

However, 40785793 "Monocytes" is not an ancestor for "Lab Tests" in the latest version of the vocab, and instead, I should use the "component of" relationship to get "Lab Tests" concepts (If I understand correctly) in concept_relationship. I guess the query we want to write is to find all Lab Tests that "has_component" Monocytes and its hierarchical children.

It would be great to put LOINC parts back into the hierarchy and many thanks in advance!