TranslatorSRI / Babel

Babel creates cliques of equivalent identifiers across many biomedical vocabularies.
MIT License
9 stars 2 forks source link

Add DrugBank labels #335

Closed gaurav closed 1 month ago

gaurav commented 3 months ago

This PR adds DrugBank labels (from DrugBank v5.1.12). Somehow closes #332, but I'm not sure how (it might be a previous change in PR #279 that really closed this).

Should be merged after PR #279.

gaurav commented 2 months ago

I dunno about this one. Are the entries in DrugBank really Drugs from our POV? They look like active ingredients (small molecules etc) to me.

I think you're right. This PR uses the DrugBank Open Vocabulary file, and most of the names are generic names like ibuprofen, Ibuprofen piconol, captopril, VTP-194204, Etanercept, Erythropoietin, WRR-99, Zofin, Krill Oil, MK-886, BMS-833923 and others. I figured it made sense to categorize all of these as drugs as a way of grouping everything from small molecules to protein hormones to organic substances that all have some sort of medical benefit. But if our criteria for "Drug" is a specific formulation (e.g. "acetaminophen 5mg capsule"), then yeah, these would not make sense. I'm not sure if we can uniformly say these are all small molecules, but I think most of them are, so I've reverted the type for DrugBank entries from biolink:Drug back to biolink:ChemicalEntity (d38ce21). I've also made a note for us to check for other small molecules/chemical entities that might have accidentally ended up in Drug.txt (#348).

Incidentally, in addition to the DrugBank ID, many of the 16,581 chemicals in the DrugBank download have a UNII, CAS or InChI Key. I don't think we can use those to categorize DrugBank entries better (or if we want to include those concords), but just wanted to mention that in case it's useful.