AstraZeneca / chemicalx

A PyTorch and TorchDrug based deep learning library for drug pair scoring. (KDD 2022)
https://chemicalx.readthedocs.io
Apache License 2.0
700 stars 89 forks source link

Add additional datasets #61

Closed cthoyt closed 2 years ago

cthoyt commented 2 years ago

Summary

This PR adds the OncoPolyPharmacology, adds a new dataloader that does all processing locally, and updates the pipeline for continuous labels (e.g., for drug synergy) as opposed to binary labels.

Changes

codecov-commenter commented 2 years ago

Codecov Report

Merging #61 (7acb34c) into main (30d3327) will decrease coverage by 6.23%. The diff coverage is 52.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #61      +/-   ##
==========================================
- Coverage   98.07%   91.83%   -6.24%     
==========================================
  Files          28       29       +1     
  Lines         675      772      +97     
==========================================
+ Hits          662      709      +47     
- Misses         13       63      +50     
Impacted Files Coverage Δ
chemicalx/data/drugfeatureset.py 100.00% <ø> (ø)
chemicalx/data/datasetloader.py 77.34% <44.44%> (-17.90%) :arrow_down:
chemicalx/data/utils.py 45.23% <45.23%> (ø)
chemicalx/data/contextfeatureset.py 90.00% <75.00%> (-10.00%) :arrow_down:
chemicalx/pipeline.py 88.57% <84.61%> (-0.14%) :arrow_down:
chemicalx/data/__init__.py 100.00% <100.00%> (ø)
tests/unit/test_models.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 30d3327...7acb34c. Read the comment docs.

cthoyt commented 2 years ago

@benedekrozemberczki I could do a mock for the OncoPolyPharmacology dataset, but maybe it makes more sense to apply the loader class to a mock dataset instead

Edit: done in d8309f4