NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

something strange with the UMLS Identifier dimethyl sulfoxide UMLS:DC0012403 #836

Open sstemann opened 3 days ago

sstemann commented 3 days ago

I'm not sure if this is a BTE issue or not but when I run the query, MVP1 What may treat Bethlem Myopathy, i get dimethyl sulfoxide twice, in in the UI

https://ui.test.transltr.io/main/results?l=Bethlem%20Myopathy&i=MONDO:0008029&t=0&r=0&q=a04bff0e-1b72-494c-ae67-6f1fbc7aa66e

image

Both say BTE

In the ARAX GUI > BTE > Result 40 shows dimethyl sulfoxide with CURIE UMLS:DC0012403, which I cannot find in the UMLS Metathesaurus

image

If I drop the D and search for UMLS:C0012403, I get dimethyl sulfoxide image

Result 94 is CHEBI:28262 which seems normal image

I'm sending to BTE first, I'm not sure if its really an NN issue given the CURIE looks non-existent

andrewsu commented 3 days ago

This partly has to do with us ingesting SuppKG, a resource that links dietary supplements to diseases. More details can be found in https://github.com/biothings/pending.api/issues/55 and https://github.com/biothings/biothings_explorer/issues/706, but the TLDR is that the resource invented these UMLS-like identifiers in cases where they didn't find a suitable UMLS identifier that already existed. We (internally) discussed the issue when we brought SuppKG on board and decided to just move forward since these are a relatively small proportion of suppKG as a whole, but we are certainly open to revisiting...

There is also something strange with the lack of EPC here, related to https://github.com/NCATSTranslator/Feedback/issues/831. We'll be looking into that one as well...

sstemann commented 21 minutes ago

in this Bethlem Myopathy query it looks like there's ~44 results with identifiers "UMLS:DC#######". On the plus side, these arent returned on the first page, but they are getting a sugeno .46 (0-1). On the other hand, some do have real CURIEs (Hawthorn Plant in UMLS is C1527346, Earthworms are C0086194). It seems odd to return results that do not have valid CURIES.

image image
subjectNode_name subjectNode_id
hawthorn UMLS:DC1621401
earthworm UMLS:DC1621389
4-hydroxyphenyl UMLS:DC0912024
2-phenyl-benzopyrans UMLS:DC0596577
ch'ih shen UMLS:DC0377336
bac ngu vi tu UMLS:DC0141729
bingpian UMLS:DC0106916
apple polyphenol extract UMLS:DC0071649
novasoy phytoestrogen extract UMLS:DC0071011
2-hydroxy-4-methoxyacetophenone UMLS:DC0069939
heparinoid UMLS:DC0066923
acetate de d-alpha tocopheryle UMLS:DC0042874
& vitamin palmitate UMLS:DC0042839
n-octadecanoic acid UMLS:DC0038229
atomic number 11 UMLS:DC0037473
5-alpha-furost-20-en-12-one-3 beta, 26-diol UMLS:DC0036189
beta-d-ribofuranose UMLS:DC0035549
b-2 UMLS:DC0035527
root UMLS:DC0035509
epoprostanol UMLS:DC0033567
fibersol-2 UMLS:DC0032594
24-beta-ethyl-delta-5-cholesten-3beta-ol UMLS:DC0031866
activator UMLS:DC0031610
flaxseed UMLS:DC0023753
hesperidin methyl chalcone UMLS:DC0019392
root UMLS:DC0017987
bilberry fruit UMLS:DC0016767
5-mthf UMLS:DC0016410
1,200 mg UMLS:DC0016157
additional omega-3 essential fatty acids UMLS:DC0015689
acides gras cetylated UMLS:DC0015684
eugenol UMLS:DC0015153
acide docosahexaenoique UMLS:DC0012968
dimethyl sulfoxide UMLS:DC0012403
fiber UMLS:DC0012173
bill henderson protocol UMLS:DC0012155
beta-sitosterol-beta-d-glycoside UMLS:DC0007158
acide butyrique UMLS:DC0006523
berberina UMLS:DC0005117
acide l-ascorbique, 6-palmitate UMLS:DC0003968
anthocyanin UMLS:DC0003161
alkaloid, nos UMLS:DC0002062
acide gras essentiel UMLS:DC0000545