TranslatorSRI / NodeNormalization

Service that produces Translator compliant nodes given a curie
MIT License
9 stars 6 forks source link

Type 2 diabetes conflated with coronary artery disease and other concepts #189

Open amykglen opened 1 year ago

amykglen commented 1 year ago

I noticed that the NodeNormalizer's cluster for Type 2 diabetes seems to include other concepts that are very much not diabetes, like 'coronary artery disease' and 'susceptibility to organophosphate poisoning':

https://nodenormalization-sri.renci.org/1.3/get_normalized_nodes?curie=MONDO:0005148&conflate=true

{
   "MONDO:0005148":{
      "id":{
         "identifier":"MONDO:0005148",
         "label":"type 2 diabetes mellitus"
      },
      "equivalent_identifiers":[
         {
            "identifier":"MONDO:0005148",
            "label":"type 2 diabetes mellitus"
         },
         {
            "identifier":"DOID:9352",
            "label":"type 2 diabetes mellitus"
         },
         {
            "identifier":"OMIM:125853"
         },
         {
            "identifier":"OMIM:147545"
         },
         {
            "identifier":"OMIM:168820"
         },
         {
            "identifier":"EFO:0001360"
         },
         {
            "identifier":"UMLS:C0011860",
            "label":"Diabetes Mellitus, Non-Insulin-Dependent"
         },
         {
            "identifier":"UMLS:C1840169",
            "label":"CORONARY ARTERY DISEASE, SUSCEPTIBILITY TO"
         },
         {
            "identifier":"UMLS:C1852091",
            "label":"INSULIN RESISTANCE, SUSCEPTIBILITY TO"
         },
         {
            "identifier":"UMLS:C2674662",
            "label":"PON1 ENZYME ACTIVITY, VARIATION IN"
         },
         {
            "identifier":"UMLS:C2674663",
            "label":"ORGANOPHOSPHATE POISONING, SUSCEPTIBILITY TO"
         },
         {
            "identifier":"UMLS:C2674665",
            "label":"MICROVASCULAR COMPLICATIONS OF DIABETES, SUSCEPTIBILITY TO, 5 (finding)"
         },
         {
            "identifier":"UMLS:C3149706",
            "label":"CORONARY ARTERY SPASM 2, SUSCEPTIBILITY TO"
         },
         {
            "identifier":"UMLS:C4017238",
            "label":"TYPE 2 DIABETES MELLITUS, PROTECTION AGAINST"
         },
         {
            "identifier":"UMLS:CN244395"
         },
         {
            "identifier":"MESH:D003924",
            "label":"Diabetes Mellitus, Type 2"
         },
         {
            "identifier":"MEDDRA:10012611"
         },
         {
            "identifier":"MEDDRA:10012613"
         },
         {
            "identifier":"MEDDRA:10026947"
         },
         {
            "identifier":"MEDDRA:10029402"
         },
         {
            "identifier":"MEDDRA:10029505"
         },
         {
            "identifier":"MEDDRA:10045242"
         },
         {
            "identifier":"MEDDRA:10067585"
         },
         {
            "identifier":"NCIT:C26747",
            "label":"Type 2 Diabetes Mellitus"
         },
         {
            "identifier":"SNOMEDCT:44054006"
         },
         {
            "identifier":"ICD10:E11"
         },
         {
            "identifier":"KEGG.DISEASE:04930"
         },
         {
            "identifier":"HP:0005978",
            "label":"Type II diabetes mellitus"
         }
      ],
      "type":[
         "biolink:DiseaseOrPhenotypicFeature",
         "biolink:BiologicalEntity",
         "biolink:NamedThing",
         "biolink:Entity",
         "biolink:ThingWithTaxon",
         "biolink:Disease"
      ]
   }
}
gaurav commented 9 months ago

Updated URL: https://nodenormalization-sri.renci.org/1.4/get_normalized_nodes?curie=MONDO:0005148&conflate=true

Thanks for reporting this! I think these concepts are being pulled in because MONDO:0005148 "type 2 diabetes mellitus" --(exactMatch)--> OMIM:125853 "TYPE 2 DIABETES MELLITUS; T2D" --(eq) --> UMLS:C1852091 "INSULIN RESISTANCE, SUSCEPTIBILITY TO" --(eq)--> OMIM:147545 "INSULIN RECEPTOR SUBSTRATE 1; IRS1" --(eq)--> UMLS:C1840169 "CORONARY ARTERY DISEASE, SUSCEPTIBILITY TO" --(eq)--> OMIM:168820 "PARAOXONASE 1; PON1" --(eq)--> UMLS:C2674663 "ORGANOPHOSPHATE POISONING, SUSCEPTIBILITY TO". So it looks like some back-and-forth mappings through OMIM and UMLS are messing things up here.

I'll check our disease processing code and see if I can figure out what's going wrong here.