ATOMScience-org / AMPL

The ATOM Modeling PipeLine (AMPL) is an open-source, modular, extensible software pipeline for building and sharing models to advance in silico drug discovery.
MIT License
136 stars 67 forks source link

external_training_data option is broken in predict_from_model_file #353

Open mcloughlin2 opened 1 month ago

mcloughlin2 commented 1 month ago

When using predict_from_model.predict_from_model_file on a model that was trained on an inaccessible dataset file, you're supposed to be able to pass an equivalent dataset in the external_training_data argument, so that AD indices can be computed. This no longer works, because ModelPipeline.predict_full_dataset loads and featurizes the training data using the original parameters saved with the model, not the modified parameters in which dataset_key was overwritten with the path to the external data.