NCATSTranslator / minihackathons

MIT License
5 stars 5 forks source link

Correct mapping of MESH:C469407 in COHD #294

Open vgardner-renci opened 2 years ago

vgardner-renci commented 2 years ago

In COHD, we have a strong link between MS and natalizumab, but unfortunately we're mapping natalizumab to MESH:C469407 which is apparently an obsolete identifier. MESH:C469407 is in SRI Node Norm as biolink:ChemicalEntity, and not SmallMolecule. So COHD will currently return natalizumab if queried against CHemicalEntity, but not SmallMolecule. We wouldn't be able to fix the mappings before the Relay meeting, but maybe can after

CaseyTa commented 2 years ago

This may not be an issue only on the COHD side of things.

Using SRI Node Resolution Lookup service, we find the following potentials:

{
  "MESH:D000069442": [
    "Natalizumab"
  ],
  "MESH:C469407": [
    "[OBSOLETE] natalizumab"
  ],
  "UMLS:C5190554": [
    "Long-term current use of natalizumab",
    "Long-term current use of natalizumab (situation)"
  ]
}

It looks like MESH:D000069442 is the appropriate CURIE to use: https://meshb.nlm.nih.gov/record/ui?ui=D000069442, but SRI Node Norm has MESH:C469407 instead of MESH:D000069442, and we rely on Node Norm.

Node norm response:

{
  "MESH:C469407": {
    "id": {
      "identifier": "MESH:C469407",
      "label": "[OBSOLETE] natalizumab"
    },
    "equivalent_identifiers": [
      {
        "identifier": "MESH:C469407",
        "label": "[OBSOLETE] natalizumab"
      }
    ],
    "type": [
      "biolink:ChemicalEntity",
      "biolink:NamedThing",
      "biolink:Entity",
      "biolink:PhysicalEssence",
      "biolink:PhysicalEssenceOrOccurrent",
      "biolink:ChemicalOrDrugOrTreatment",
      "biolink:ChemicalEntityOrGeneOrGeneProduct",
      "biolink:ChemicalEntityOrProteinOrPolypeptide"
    ]
  },
  "MESH:D000069442": null,
  "UMLS:C5190554": {
    "id": {
      "identifier": "UMLS:C5190554",
      "label": "Long-term current use of natalizumab"
    },
    "equivalent_identifiers": [
      {
        "identifier": "UMLS:C5190554",
        "label": "Long-term current use of natalizumab"
      },
      {
        "identifier": "SNOMEDCT:16755631000119109"
      }
    ],
    "type": [
      "biolink:PhenotypicFeature",
      "biolink:DiseaseOrPhenotypicFeature",
      "biolink:BiologicalEntity",
      "biolink:NamedThing",
      "biolink:Entity",
      "biolink:ThingWithTaxon"
    ]
  }
}

And as @cbizon pointed out, natalizumab is not a small molecule, so even if the MESH identifier is corrected in COHD and Node Norm, COHD still wouldn't return natalizumab on a query against biolink:SmallMolecule.

Thoughts, @jh111 and @cbizon?

cbizon commented 2 years ago

Agreed, this looks like an issue for NN

jh111 commented 2 years ago

Is there a general way to search for obsolete ids?

CaseyTa commented 2 years ago

Is there a general way to search for obsolete ids?

@jh111 In this case, the label has [OBSOLETE] in the name, so that makes it easy.

But regarding Workflow C, fixing the obsolete mapping won't help if the query is against biolink:SmallMolecule and if natalizumab is not considered a biolink:SmallMolecule. The query may need to be opened up to biolink:ChemicalEntity if you want to capture natalizumab.