The ATOM Modeling PipeLine (AMPL) is an open-source, modular, extensible software pipeline for building and sharing models to advance in silico drug discovery.
MIT License
136
stars
68
forks
source link
Load pre-calculated features with embedded models #301
This allows AMPL to use pre-calculated features with embedded models and transfer learning. I created 3 classes to accomplish this.
EmbeddingDataset: This overrides get_featurized_data and save_featurized_data. This dataset is meant to be exclusively used with EmbeddingFeaturization. It creates an second, member dataset that loads/calculates features that are used as input into the embedding model and then generates the embedded features. The save_featurized_data function does nothing, since it cannot save embedded features. However it can save features for the member dataset.
FileEmbeddingDataset: This inherits EmbeddingDataset and FileDataset. This is used when the input features come from a file.
DatastoreEmbeddingDataset: This inherits EmbeddingDataset and DatastoreDataset and is used when input features come from the datastore. I don't test this class, I don't have a good test case.
featurize_data in EmbeddingFeaturization no long needs to rename response columns.
This allows AMPL to use pre-calculated features with embedded models and transfer learning. I created 3 classes to accomplish this.
get_featurized_data
andsave_featurized_data
. This dataset is meant to be exclusively used with EmbeddingFeaturization. It creates an second, member dataset that loads/calculates features that are used as input into the embedding model and then generates the embedded features. Thesave_featurized_data
function does nothing, since it cannot save embedded features. However it can save features for the member dataset.featurize_data
in EmbeddingFeaturization no long needs to rename response columns.