NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

merge Botulinum toxin A nodes #395

Open TranslatorIssueCreator opened 1 year ago

TranslatorIssueCreator commented 1 year ago

Type: Bug Report

URL: https://ui.ci.transltr.io/results?l=Bethlem%20Myopathy&i=MONDO:0008029&t=0&q=1fb6945c-668b-4750-85f4-2daa53eb4596

ARS PK: 98ca4253-5d0e-4741-9ace-0e051a37c0c7

Steps to reproduce:

CI environment MVP1 Bethlem disease

Screenshots:

cbizon commented 1 year ago

https://github.com/TranslatorSRI/Babel/issues/164

sandrine-m commented 1 year ago

PK : 1fb6945c-668b-4750-85f4-2daa53eb4596 results on CI shows 2 results (result 1 and 10) with same compound

image

@gglusman : The two identifiers are http://identifiers.org/unii/E211KPY694 and http://identifiers.org/umls/C0006050

sandrine-m commented 1 year ago

Output of Name resolver

  "CHEMBL.COMPOUND:CHEMBL4297862": [
    "BOTULINUM TOXIN TYPE A"
  ],
  "UNII:E211KPY694": [
    "BOTULINUM TOXIN TYPE A",
    "[OBSOLETE] onabotulinumtoxinA"
  ],
  "MESH:D019274": [
    "Botulinum toxin type A",
    "Botulinum Toxins, Type A",
    "Botulin A"
  ],
  "UMLS:C0006050": [
    "botulinum toxin type a",
    "Botulinum toxin type A",
    "BOTULINUM TOXIN TYPE A",
    "botulinum toxin type A",
    "Botulinum Toxin Type A",
    "toxin botulinum type a",
    "Botulinum Toxins, Type A",
    "Clostridium Botulinum Toxin Type A",
    "Botulinum toxin type A (substance)",
    "Botulinum toxin type A-containing product",
    "Product containing botulinum toxin type A (medicinal product)",
    "BTX-A",
    "Xeomin",
    "BoNT-A",
    "Dysport",
    "dysport",
    "onaclostox",
    "Onaclostox",
    "AbobotulinumA",
    "Botulinum A Toxin",
    "Botulinum Toxin A",
    "botulinum toxin a",
    "botulinum toxin A",
    "botulinum a toxin",
    "Botulinum toxin A",
    "botulinum A toxin",
    "AbobotulinumtoxinA",
    "Onabotulinumtoxina",
    "Toxin, Botulinum A",
    "ABOBOTULINUMTOXINA",
    "OnabotulinumtoxinA",
    "EvabotulinumtoxinA",
    "abobotulinumtoxina",
    "Toxin A, Botulinum",
    "onabotulinumtoxinA",
    "abobotulinumtoxinA",
    "ONABOTULINUMTOXINA",
    "Prabotulinumtoxin A",
    "prabotulinumtoxin A",
    "DaxibotulinumtoxinA",
    "Toxina botulínica A",
    "INCOBOTULINUMTOXINA",
    "Onabotulinumtoxin A",
    "IncobotulinumtoxinA",
    "abobotulinumtoxin A",
    "incobotulinumtoxinA",
    "abobotulinum toxin A",
    "Toxine botulinique A",
    "botulinum toxin type",
    "Botulinum A neurotoxin",
    "Botulinum Neurotoxin A",
    "Neurotoxin A, Botulinum",
    "Botulinum antitoxin type A",
    "Botulinum Neurotoxin Type A",
    "botulinum neurotoxin type A",
    "Clostridium botulinum A Toxin",
    "Clostridium botulinum toxin A",
    "AbobotulinumtoxinA (substance)",
    "OnabotulinumtoxinA (substance)",
    "IncobotulinumtoxinA (substance)",
    "onabotulinumtoxinA (medication)",
    "abobotulinumtoxina (medication)",
    "IncobotulinumtoxinA (medication)",
    "OnabotulinumtoxinA-containing product",
    "AbobotulinumtoxinA-containing product",
    "IncobotulinumtoxinA-containing product",
    "Product containing onabotulinumtoxinA (medicinal product)",
    "Product containing abobotulinumtoxinA (medicinal product)",
    "neuromuscular blockers botulinum toxin incobotulinumtoxina",
    "Product containing incobotulinumtoxinA (medicinal product)"
  ],
  "UMLS:C5235585": [
    "Botulinum Toxin Type A5"
  ],
  "UMLS:C5235587": [
    "Botulinum Toxin Type A7"
  ],
  "UMLS:C5235584": [
    "Botulinum Toxin Type A4"
  ],
  "UMLS:C5235583": [
    "Botulinum Toxin Type A3"
  ],
  "UMLS:C5235586": [
    "Botulinum Toxin Type A6"
  ],
  "UMLS:C5235582": [
    "Botulinum Toxin Type A1"
  ]
}
sandrine-m commented 1 year ago

@cbizon : This is not a conflation issue but a normalization one.

sierra-moxon commented 1 year ago

from TAQA: two cliques for BTA - one has all the usual IDs, one has just UMLS (hard to map UMLS to the rest); move this to Fall because not an easy fix. Could be drug conflator is the issue here.

gaurav commented 1 year ago

This should be conflated by the Drug Conflator -- as you can see in https://nodenormalization-dev.apps.renci.org/1.4/get_normalized_nodes?curie=UMLS%3AC0006050&curie=UNII%3AE211KPY694&conflate=true&drug_chemical_conflate=true, UMLS:C0006050 is listed as an alternate ID for UNII:E211KPY694, and it's not clear why that isn't happening. I am investigating.

sstemann commented 8 months ago

@gaurav this is still any issue - who should this go to?

gaurav commented 5 months ago

This is still on me. The problem is that UMLS:C0006050 is a Protein while UNII:E211KPY694 is a ChemicalEntity, which are handled separately in Babel and so they won't be combined as-is. I'm still thinking about how best to combine them, as I don't know any source of UNII-protein connections (https://github.com/TranslatorSRI/Babel/issues/164).

I'm also annoyed that it is possible to have the same identifier in multiple cliques because of how NodeNorm's databases are designed, but that's out of scope for this issue and possibly for this year (https://github.com/TranslatorSRI/Babel/issues/276).

gaurav commented 3 months ago

Without drug conflation, we now have 8 cliques:

  1. UNII:E211KPY694 "botulinum toxin type A" (ChemicalEntity, which now includes CHEMBL.COMPOUND:CHEMBL4297862 and MESH:D019274)
  2. UMLS:C0006050 "botulinum toxin type A" (Protein)
  3. UMLS:C5235582 "Botulinum Toxin Type A1" (Protein)
  4. UMLS:C5235583 "Botulinum Toxin Type A3" (Protein)
  5. UMLS:C5235584 "Botulinum Toxin Type A4" (Protein)
  6. UMLS:C5235585 "Botulinum Toxin Type A5" (Protein)
  7. UMLS:C5235586 "Botulinum Toxin Type A6" (Protein)
  8. UMLS:C5235587 "Botulinum Toxin Type A7" (Protein)

So we're definitely doing better, but we still have some UMLS terms we need to combine, which is a pretty high priority for us (https://github.com/TranslatorSRI/Babel/issues/302). I'll try to have this fixed by Guppy.

gaurav commented 1 month ago

I'm pushing all protein/chemical combination work into Hammerhead. Plus, adding a manual conflation to proteins turns out to be trickier than adding a manual conflation to chemical entities.