AstraZeneca / chemicalx

A PyTorch and TorchDrug based deep learning library for drug pair scoring. (KDD 2022)
https://chemicalx.readthedocs.io
Apache License 2.0
708 stars 87 forks source link

Adding DrugBank DDI and Two Sides. #48

Closed benedekrozemberczki closed 2 years ago

benedekrozemberczki commented 2 years ago

Summary

Changes

cthoyt commented 2 years ago

I’d highly request provenance information on how these new datasets were constructed. Were they automatically downloaded from Some external repo? Was processing done to them?

benedekrozemberczki commented 2 years ago

Yes, @cthoyt I will do that in a moment.

benedekrozemberczki commented 2 years ago

There will be a whole Appendix section about this in the paper.

cthoyt commented 2 years ago

While you’re thinking about it maybe also consider doing the same for the previous two datasets as well :)

benedekrozemberczki commented 2 years ago

@cthoyt How about a dataset preprocessing section in the documentation?

cthoyt commented 2 years ago

Tbh the only important documentation of data preprocessing to me is code that can exactly reproduce it. Let’s start there and backfill prose-based documentation if there are any places where it can’t be better documented in code itself

benedekrozemberczki commented 2 years ago

Added the cleaning scripts.

cthoyt commented 2 years ago

It appears you merged the branch with failing tests. This shouldn’t be allowed/possible - the solution is to add some branch protection rules in the settings for the repository

benedekrozemberczki commented 2 years ago

Does it require changes to Github actions?

Screenshot 2022-01-17 at 22 19 47
cthoyt commented 2 years ago

Nope that looks right!