fastdatascience / drug_named_entity_recognition

https://fastdatascience.com/drug-named-entity-recognition-python-library/
MIT License
18 stars 7 forks source link

Calculate molecular weight #5

Open woodthom2 opened 3 months ago

woodthom2 commented 3 months ago

We already have molecular structure in .mol format provided by Drugbank data.

from drug_named_entity_recognition.drugs_finder import find_drugs

drugs = find_drugs("i bought some Bivalirudin".split(" "), is_include_structure=True)

self.assertEqual(1, len(drugs))
self.assertEqual("Bivalirudin", drugs[0][0]['name'])
self.assertIn("0.0000 C", drugs[0][0]['structure_mol'])

Can we convert to SMILES on the fly? Can we calculate molecular weight on the fly?

Ideally can you do this without adding anything more to requirements.txt? There are some chemistry libraries but they can be quite heavy.

abdullahwaqar commented 2 months ago

Hey @woodthom2, wanted to check if it is still open for contributions. If so, I would like to contribute.

woodthom2 commented 2 months ago

Hi @abdullahwaqar yes this is still open! Can you see a way to add molecular weight, or a data source which will give us the molecular weight? Here's an example for one single drug: https://www.opnme.com/molecules/khk-inhibitor-bi-9787

we have Weight: 489.6 DA, or also properties such as tmax and Cmax. I don't know if such a database exists, it's possible Drugbank gives the data to us but we need to check licences.

Also, you can see in the link that I pasted there is a nice moving 3D image of the drug. The positions in the atoms are now returned by the library in a string format (you can see my example for paracetamol here: https://fastdatascience.com/ai-in-pharma/drug-named-entity-recognition-update-2/#molecular-structures ) - it would be nice to have a Jupyter notebook (or Colab notebook) or in-browser example of us rendering this molecular structure, either as a static image or a dynamic view of some kind. I definitely do not want to add any graphics libraries as dependencies to the project, but having this as an example would be great.

The molecular structure data that we have could also be a shortcut to getting the molecular weight of a drug..

Thanks!

woodthom2 commented 1 month ago

https://pubchem.ncbi.nlm.nih.gov/docs/downloads#section=Individual-Record-Download <- this is a good source for molecular weight which appears to be allowed for our use. I cannot take the molecular weights from Drugbank because it is not allowed under the license.

If we use Pubchem we can also take the SMILES value which will be useful, e.g.

CN1C2=C(C=C(C=C2)C(=O)N(CCC(=O)O)C3=CC=CC=N3)N=C1CNC4=CC=C(C=C4)C(=N)N