datamol-io / datamol

Molecular Processing Made Easy.
https://docs.datamol.io
Apache License 2.0
452 stars 47 forks source link

Improve chembl drugs dataset #213

Closed hadim closed 10 months ago

hadim commented 11 months ago

By giving more columns. We can simply backport the script made on medchem at https://github.com/datamol-io/medchem/blob/9ee009a5495d0146ec46a804fa458944711d9e28/notebooks/Get_ChEMBL_Approved_Drugs.ipynb

hadim commented 11 months ago

In datamol we provide a dataset of all approved drugs in ChEMBL: dm.data.chembl_drugs(). But that dataset contains only one single column: smiles.

It would be nice to have the same dataset but with some "metadata" columns such as ChEMBL ID, date of approval, etc.

For medchem, I created a small script that curate all approved drugs from ChEMBL and keep a few metadata columns.

This dataset is used in the "Basic Concept" tutorial of medchem at https://medchem-docs.datamol.io/stable/tutorials/Basic_Concepts.html


The task:

hadim commented 11 months ago

@stwhitfield let me know if you have any questions!