AstraZeneca / chemicalx

A PyTorch and TorchDrug based deep learning library for drug pair scoring. (KDD 2022)
https://chemicalx.readthedocs.io
Apache License 2.0
702 stars 89 forks source link

Provide base class for dataset loaders #59

Closed cthoyt closed 2 years ago

cthoyt commented 2 years ago

Summary

This PR abstracts the essential components of the dataset loader into a base class to allow for future implementations of eager datasets (e.g., all parts of the dataset are already in memory) and for other lazy local dataset loaders.

Changes

Next steps

The following shows an implementation of an eager dataset, which might be more useful for local datasets.

@dataclass
class EagerDatasetLoader(DatasetLoader):
    """An eager dataset."""

    context_feature_set: ContextFeatureSet
    drug_feature_set: DrugFeatureSet
    labeled_triples: LabeledTriples

    def get_labeled_triples(self) -> LabeledTriples:
        """Get the labeled triples file from the storage."""
        return self.labeled_triples

    def get_context_features(self) -> ContextFeatureSet:
        """Get the context feature set."""
        return self.context_feature_set

    def get_drug_features(self):
        """Get the drug feature set."""
        return self.drug_feature_set
aminemosbah commented 2 years ago

any chance we would load , train and essentially test our own dataset ?

cthoyt commented 2 years ago

@aminemosbah I would suggest looking at https://chemicalx.readthedocs.io/en/latest/api/chemicalx.data.LocalDatasetLoader.html#chemicalx.data.LocalDatasetLoader for loading your own dataset that's already in the right format in a given directory

aminemosbah commented 2 years ago

thx, but i have smiles to predict in a csv file to predict locally , any quick snippet ?

cthoyt commented 2 years ago

@aminemosbah I have in mind a solution for what you want (which is the obvious realistic use case) but this it is blocked #50 and #58. @benedekrozemberczki would love to get your input on #50 ;)

aminemosbah commented 2 years ago

need to hack the dataloader to make it work for local data

aminemosbah commented 2 years ago

here is a possible solution https://chemicalx.readthedocs.io/en/latest/api/chemicalx.data.LocalDatasetLoader.html#chemicalx.data.LocalDatasetLoader