TranslatorSRI / Babel

Babel creates cliques of equivalent identifiers across many biomedical vocabularies.
MIT License
8 stars 2 forks source link

DrugBank ID being dropped between untyped compendium and final ChemicalEntity compendium #332

Closed gaurav closed 1 week ago

gaurav commented 1 month ago

We have the following entry in babel_outputs/intermediate/chemicals/partials/untyped_compendium:

{'MESH:D014568', 'DrugCentral:5109', 'UMLS:C0042071', 'UNII:83G67E21XI', 'CHEMBL.COMPOUND:CHEMBL1201420', 'RXCUI:11055', 'DRUGBANK:DB00013'}

However, when we look at the final clique in babel_outputs/compendia/ChemicalEntity.txt, it looks like this:

{"type": "biolink:ChemicalEntity", "identifiers": [{"i": "UNII:83G67E21XI", "l": "UROKINASE", "d": [], "t": []}, {"i": "MESH:D014568", "l": "Urokinase-Type Plasminogen Activator", "d": [], "t": []}, {"i": "UMLS:C0042071", "l": "urokinase", "d": [], "t": []}, {"i": "CHEMBL.COMPOUND:CHEMBL1201420", "l": "UROKINASE", "d": [], "t": []}, {"i": "DrugCentral:5109", "l": "urokinase", "d": [], "t": []}, {"i": "RXCUI:11055", "d": [], "t": []}], "taxa": []}

So why was the DrugBank ID dropped? The simplest explanation would be that it's not a preferred prefix for ChemicalEntity, but it is. So what's going on?

gaurav commented 1 month ago

DrugBank identifiers are present in Babel 2024mar14 but not in 2024jul13. Those two releases are very similar, so it's either one of the few changes we made between them or some error we're running into in reading the Biolink model information.

gaurav commented 1 month ago

This has somehow fixed itself in PR #335 -- as far as I can tell, all I did was update the Biolink Model version, but I'm not sure how that would have caused a problem. I'll add this to my tests and then close it.