datamol-io / datamol

Molecular Processing Made Easy.
https://docs.datamol.io
Apache License 2.0
452 stars 47 forks source link

Improve the ChEMBL drugs dataset #214

Closed stwhitfield closed 10 months ago

stwhitfield commented 10 months ago

Purpose: For datamol to have a dataset of all approved drugs in ChEMBL that contains metadata columns such as ChEMBL ID, date of approval, etc.

Changelogs

Added chembl_approved_drugs to datamol/data/ Modified dm.data.chembl_drugs() to leverage it Adapted docstring to explain how it was generated Modified unit tests Added notebooks folder with code that generated chembl_approved_drugs.parquet


Checklist:


codecov[bot] commented 10 months ago

Codecov Report

Merging #214 (994ef96) into main (e3c4a38) will not change coverage. The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main     #214   +/-   ##
=======================================
  Coverage   91.91%   91.91%           
=======================================
  Files          46       46           
  Lines        3835     3835           
=======================================
  Hits         3525     3525           
  Misses        310      310           
Flag Coverage Δ
unittests 91.91% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
datamol/data/__init__.py 78.07% <100.00%> (ø)

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

hadim commented 10 months ago

Thanks @stwhitfield for your first contribution to datamol!