AstraZeneca / chemicalx

A PyTorch and TorchDrug based deep learning library for drug pair scoring. (KDD 2022)
https://chemicalx.readthedocs.io
Apache License 2.0
702 stars 89 forks source link

Clean up DrugBank and TWOSIDES importers #57

Closed cthoyt closed 2 years ago

cthoyt commented 2 years ago

Summary

This PR is the first step towards overhauling the dataset loading process to make it easier to bring your own dataset.

Changes

Next Steps

I'm not really sure if we should use the datasets that have generate negative samples in practice since that might lead to overfitting. I guess for ML people, having the datasets just there is good because then they don't have to think about quality or concerns like these. These two things are always big conflicts in my mind.

Ultimately, I'd like an interface that does all of the data-preprocessing on-the-fly and uses locally cached results instead of looking for a web-based version of the dataset. I will look into this after #50 is done and I can check out the code for DrugComb and DrugCombDB.

codecov-commenter commented 2 years ago

Codecov Report

Merging #57 (c040b9d) into main (29e7f5f) will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main      #57   +/-   ##
=======================================
  Coverage   97.60%   97.60%           
=======================================
  Files          28       28           
  Lines         669      669           
=======================================
  Hits          653      653           
  Misses         16       16           

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 29e7f5f...c040b9d. Read the comment docs.