RTXteam / RTX

Software repo for Team Expander Agent (Oregon State U., Institute for Systems Biology, and Penn State U.)
https://arax.ncats.io/
MIT License
33 stars 20 forks source link

Type 2 diabetes conflated with coronary artery disease and more #1837

Closed amykglen closed 1 year ago

amykglen commented 2 years ago

noticed this conflation while working on RTXteam/RTX-KG2#210:

https://arax.ncats.io/?term=DOID:9352

in particular, these nodes from the above page don't seem like they belong in the type 2 diabetes synonym cluster:

IDENTIFIER LABEL ORIGINAL LABEL CATEGORY
UMLS:C2674663 Organophosphate poisoning, susceptibility to Organophosphate poisoning, susceptibility to biolink:DiseaseOrPhenotypicFeature
UMLS:C1840169 Coronary artery disease, susceptibility to Coronary artery disease, susceptibility to biolink:DiseaseOrPhenotypicFeature
UMLS:C2674662 Pon1 enzyme activity, variation in Pon1 enzyme activity, variation in biolink:DiseaseOrPhenotypicFeature
UMLS:C3149706 Coronary artery spasm 2, susceptibility to Coronary artery spasm 2, susceptibility to biolink:DiseaseOrPhenotypicFeature

there are other very questionable concepts in this cluster as well (like MICROVASCULAR COMPLICATIONS OF DIABETES, SUSCEPTIBILITY TO, 5 (finding)), but the four in the above table are clearly incorrect.

the conflation doesn't seem to be due to same_as edges based on a quick look at the KG2pre neo4j... so I'm not sure where it's coming from. maybe the SRI normalizer?

amykglen commented 1 year ago

ok, so in the new synonymizer, it's become clear that this conflation (type 2 diabetes with coronary artery disease) is coming from the SRI:

Cluster for MONDO:0005148 has 26 nodes:

id category name in_SRI in_KG2pre is_cluster_rep
DOID:9352 Disease type 2 diabetes mellitus X X
EFO:0001360 Disease obsolete_type II diabetes mellitus X X
HP:0005978 Disease Type II diabetes mellitus X X
ICD10:E11 Disease X
KEGG.DISEASE:04930 Disease X
MEDDRA:10012611 Disease X
MEDDRA:10012613 Disease X
MEDDRA:10026947 Disease X
MEDDRA:10029402 Disease X
MEDDRA:10029505 Disease X
MEDDRA:10045242 Disease X
MEDDRA:10067585 Disease X
MESH:D003924 Disease Diabetes Mellitus, Type 2 X X
MONDO:0005148 Disease type 2 diabetes mellitus X X X
NCIT:C26747 Disease Type 2 Diabetes Mellitus X X
OMIM:125853 Disease Type 2 diabetes mellitus related phenotypic feature X X
SNOMEDCT:44054006 Disease X
UMLS:C0011860 Disease Diabetes Mellitus, Non-Insulin-Dependent X X
UMLS:C1840169 Disease CORONARY ARTERY DISEASE, SUSCEPTIBILITY TO X X
UMLS:C1852091 Disease INSULIN RESISTANCE, SUSCEPTIBILITY TO X X
UMLS:C2674662 Disease PON1 ENZYME ACTIVITY, VARIATION IN X X
UMLS:C2674663 Disease ORGANOPHOSPHATE POISONING, SUSCEPTIBILITY TO X X
UMLS:C2674665 Disease MICROVASCULAR COMPLICATIONS OF DIABETES, SUSCEPTIBILITY TO, 5 (finding) X X
UMLS:C3149706 Disease CORONARY ARTERY SPASM 2, SUSCEPTIBILITY TO X X
UMLS:C4017238 Disease TYPE 2 DIABETES MELLITUS, PROTECTION AGAINST X X
UMLS:CN244395 Disease X

(we don't do any re-clustering of SRI nodes, so the fact that these nodes are 'in_SRI' means they assigned them to this cluster, with the other listed SRI nodes)

I wrote this up in: https://github.com/TranslatorSRI/NodeNormalization/issues/189

amykglen commented 1 year ago

I'm gonna close this issue since it's in the SRI's hands