TranslatorSRI / Babel

Babel creates cliques of equivalent identifiers across many biomedical vocabularies.
MIT License
8 stars 2 forks source link

Two cliques for "Opioids" #212

Open cbizon opened 7 months ago

cbizon commented 7 months ago

UMLS:C0242402 PUBCHEM.COMPOUND:126961754

These are separate cliques. How to merge?

Also odd: the pubchem clique includes an inchikey, but opioid is more of a class...?

cbizon commented 6 months ago

And there is a third!

"equivalent_identifiers": [
      {
        "identifier": "CHEBI:35482",
        "label": "opioid analgesic"
      },
      {
        "identifier": "MESH:D000701",
        "label": "Analgesics, Opioid"
      },
      {
        "identifier": "UMLS:C0002772",
        "label": "Analgesics, Opioid"
      }
    ],

The CHEBI is a Role, which is even a bit more annoying.

cbizon commented 6 months ago

The two UMLS are both linked to the MESH term:

C0002772|ENG|P|L0280158|PF|S0355323|N|A0389632||M0001068|D000701|MSH|MH|D000701|Analgesics, Opioid|0|N|256|
C0242402|ENG|P|L0189434|PF|S0256189|N|A10901018||M0014482|D000701|MSH|PEP|D000701|Opioids|0|N|256|

But we only accept the first (MH) one in line with this comment in umls.py:

On UMLS / MESH: we have been getting all UMLS / MESH relationships. This has led to some clear mistakes and logical impossibilities such as cyclical subclasses. On further review, we can sharpen these relationships by choosing the best match UMLS for each MESH. We will make use of the TTY column (column 12) in MRCONSO. This column can have a lot of values, but every MESH has one of (and only one of): MH, NM, HT, QAB. These will be the ones that we pull, as they correspond to the "main" name or heading of the mesh entry.