Closed cthoyt closed 2 years ago
Merging #57 (c040b9d) into main (29e7f5f) will not change coverage. The diff coverage is
n/a
.
@@ Coverage Diff @@
## main #57 +/- ##
=======================================
Coverage 97.60% 97.60%
=======================================
Files 28 28
Lines 669 669
=======================================
Hits 653 653
Misses 16 16
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 29e7f5f...c040b9d. Read the comment docs.
Summary
This PR is the first step towards overhauling the dataset loading process to make it easier to bring your own dataset.
Changes
pystow
Next Steps
I'm not really sure if we should use the datasets that have generate negative samples in practice since that might lead to overfitting. I guess for ML people, having the datasets just there is good because then they don't have to think about quality or concerns like these. These two things are always big conflicts in my mind.
Ultimately, I'd like an interface that does all of the data-preprocessing on-the-fly and uses locally cached results instead of looking for a web-based version of the dataset. I will look into this after #50 is done and I can check out the code for DrugComb and DrugCombDB.