I think it would be better to have the dataset download and processing happen client-side, then use pystow to store the results in a reliable place. This would also allow the TWOSIDES and DrugBank datasets, which require random negative sampling, to be used with multiple random seeds, e.g. to investigate the robustness of results. Further, it would allow for a more idiomatic dataset loader that's extensible to new datasets
I think it would be better to have the dataset download and processing happen client-side, then use
pystow
to store the results in a reliable place. This would also allow the TWOSIDES and DrugBank datasets, which require random negative sampling, to be used with multiple random seeds, e.g. to investigate the robustness of results. Further, it would allow for a more idiomatic dataset loader that's extensible to new datasetsDepends on: