Closed hadim closed 10 months ago
In datamol we provide a dataset of all approved drugs in ChEMBL: dm.data.chembl_drugs()
. But that dataset contains only one single column: smiles
.
It would be nice to have the same dataset but with some "metadata" columns such as ChEMBL ID, date of approval, etc.
For medchem, I created a small script that curate all approved drugs from ChEMBL and keep a few metadata columns.
This dataset is used in the "Basic Concept" tutorial of medchem at https://medchem-docs.datamol.io/stable/tutorials/Basic_Concepts.html
The task:
chembl_approved_drugs.parquet
in datamol/data/
dm.data.chembl_drugs()
to leverage it + adapt the docstring to explain how it has been generated./notebooks/
@stwhitfield let me know if you have any questions!
By giving more columns. We can simply backport the script made on
medchem
at https://github.com/datamol-io/medchem/blob/9ee009a5495d0146ec46a804fa458944711d9e28/notebooks/Get_ChEMBL_Approved_Drugs.ipynb