Open TranslatorIssueCreator opened 1 year ago
PK : 1fb6945c-668b-4750-85f4-2daa53eb4596 results on CI shows 2 results (result 1 and 10) with same compound
@gglusman : The two identifiers are http://identifiers.org/unii/E211KPY694 and http://identifiers.org/umls/C0006050
Output of Name resolver
"CHEMBL.COMPOUND:CHEMBL4297862": [
"BOTULINUM TOXIN TYPE A"
],
"UNII:E211KPY694": [
"BOTULINUM TOXIN TYPE A",
"[OBSOLETE] onabotulinumtoxinA"
],
"MESH:D019274": [
"Botulinum toxin type A",
"Botulinum Toxins, Type A",
"Botulin A"
],
"UMLS:C0006050": [
"botulinum toxin type a",
"Botulinum toxin type A",
"BOTULINUM TOXIN TYPE A",
"botulinum toxin type A",
"Botulinum Toxin Type A",
"toxin botulinum type a",
"Botulinum Toxins, Type A",
"Clostridium Botulinum Toxin Type A",
"Botulinum toxin type A (substance)",
"Botulinum toxin type A-containing product",
"Product containing botulinum toxin type A (medicinal product)",
"BTX-A",
"Xeomin",
"BoNT-A",
"Dysport",
"dysport",
"onaclostox",
"Onaclostox",
"AbobotulinumA",
"Botulinum A Toxin",
"Botulinum Toxin A",
"botulinum toxin a",
"botulinum toxin A",
"botulinum a toxin",
"Botulinum toxin A",
"botulinum A toxin",
"AbobotulinumtoxinA",
"Onabotulinumtoxina",
"Toxin, Botulinum A",
"ABOBOTULINUMTOXINA",
"OnabotulinumtoxinA",
"EvabotulinumtoxinA",
"abobotulinumtoxina",
"Toxin A, Botulinum",
"onabotulinumtoxinA",
"abobotulinumtoxinA",
"ONABOTULINUMTOXINA",
"Prabotulinumtoxin A",
"prabotulinumtoxin A",
"DaxibotulinumtoxinA",
"Toxina botulínica A",
"INCOBOTULINUMTOXINA",
"Onabotulinumtoxin A",
"IncobotulinumtoxinA",
"abobotulinumtoxin A",
"incobotulinumtoxinA",
"abobotulinum toxin A",
"Toxine botulinique A",
"botulinum toxin type",
"Botulinum A neurotoxin",
"Botulinum Neurotoxin A",
"Neurotoxin A, Botulinum",
"Botulinum antitoxin type A",
"Botulinum Neurotoxin Type A",
"botulinum neurotoxin type A",
"Clostridium botulinum A Toxin",
"Clostridium botulinum toxin A",
"AbobotulinumtoxinA (substance)",
"OnabotulinumtoxinA (substance)",
"IncobotulinumtoxinA (substance)",
"onabotulinumtoxinA (medication)",
"abobotulinumtoxina (medication)",
"IncobotulinumtoxinA (medication)",
"OnabotulinumtoxinA-containing product",
"AbobotulinumtoxinA-containing product",
"IncobotulinumtoxinA-containing product",
"Product containing onabotulinumtoxinA (medicinal product)",
"Product containing abobotulinumtoxinA (medicinal product)",
"neuromuscular blockers botulinum toxin incobotulinumtoxina",
"Product containing incobotulinumtoxinA (medicinal product)"
],
"UMLS:C5235585": [
"Botulinum Toxin Type A5"
],
"UMLS:C5235587": [
"Botulinum Toxin Type A7"
],
"UMLS:C5235584": [
"Botulinum Toxin Type A4"
],
"UMLS:C5235583": [
"Botulinum Toxin Type A3"
],
"UMLS:C5235586": [
"Botulinum Toxin Type A6"
],
"UMLS:C5235582": [
"Botulinum Toxin Type A1"
]
}
@cbizon : This is not a conflation issue but a normalization one.
from TAQA: two cliques for BTA - one has all the usual IDs, one has just UMLS (hard to map UMLS to the rest); move this to Fall because not an easy fix. Could be drug conflator is the issue here.
This should be conflated by the Drug Conflator -- as you can see in https://nodenormalization-dev.apps.renci.org/1.4/get_normalized_nodes?curie=UMLS%3AC0006050&curie=UNII%3AE211KPY694&conflate=true&drug_chemical_conflate=true, UMLS:C0006050 is listed as an alternate ID for UNII:E211KPY694, and it's not clear why that isn't happening. I am investigating.
@gaurav this is still any issue - who should this go to?
This is still on me. The problem is that UMLS:C0006050 is a Protein while UNII:E211KPY694 is a ChemicalEntity, which are handled separately in Babel and so they won't be combined as-is. I'm still thinking about how best to combine them, as I don't know any source of UNII-protein connections (https://github.com/TranslatorSRI/Babel/issues/164).
I'm also annoyed that it is possible to have the same identifier in multiple cliques because of how NodeNorm's databases are designed, but that's out of scope for this issue and possibly for this year (https://github.com/TranslatorSRI/Babel/issues/276).
Without drug conflation, we now have 8 cliques:
So we're definitely doing better, but we still have some UMLS terms we need to combine, which is a pretty high priority for us (https://github.com/TranslatorSRI/Babel/issues/302). I'll try to have this fixed by Guppy.
I'm pushing all protein/chemical combination work into Hammerhead. Plus, adding a manual conflation to proteins turns out to be trickier than adding a manual conflation to chemical entities.
in Hammerhead release on Test: https://ui.test.transltr.io/results?l=Bethlem%20Myopathy&i=MONDO:0008029&t=0&r=0&q=bc5a5d5e-6b8f-4f73-bf74-a2e1c2af46b7
One result for Botulinum Toxin Type A. Four results for Botulinum.
@gaurav are you expecting any other changes for name resolver for this issue or can we consider it closed?
Type: Bug Report
URL: https://ui.ci.transltr.io/results?l=Bethlem%20Myopathy&i=MONDO:0008029&t=0&q=1fb6945c-668b-4750-85f4-2daa53eb4596
ARS PK: 98ca4253-5d0e-4741-9ace-0e051a37c0c7
Steps to reproduce:
CI environment MVP1 Bethlem disease
Screenshots: