Oloren-AI / olorenchemengine

OCE is the first infinitely composable library for reproducibly implementing SOTA molecular property prediction/QSAR techniques.
MIT License
98 stars 14 forks source link

How to save error model #88

Closed hshany closed 1 year ago

hshany commented 1 year ago

When loading a saved model with error model, the error model is always getting rebuilt and refitted. How can one save the model along with the trained error model?

This is how I save and load the model:

model.create_error_model(
        error_model=oce.BootstrapEnsemble(), 
        n_ensembles=5,
        bootstrap_size=0.8,
        X_train=dataset.entire_dataset[0], 
        y_train=dataset.entire_dataset[1]
    )

oce.save(model, 'test_model.oce')

model = oce.load('test_model.oce')
davidzqhuang commented 1 year ago

(Re)building and (re)fitting the error model is near-deterministic because the corresponding residuals and scores are saved, so it is basically just redoing the curve-fit.

Adding this to the TODO:

hshany commented 1 year ago

It's not only about the curve-fitting. In the example above which uses BootstrapEnsemble as the scoring model, it appears that the ensemble of models are getting retrained during rebuilding, which takes a long time and doesn't seem necessary.

davidzqhuang commented 1 year ago

Thank you for bringing this to our attention. The original code snippet you posted now should work without the time-intensive retraining of the ensemble. Fixed in PR #89 and subsequent patches.

We will make a pypi release shortly, until then this can be solved by installing as such:

pip install --upgrade "olorenchemengine[full] @ git+https://github.com/Oloren-AI/olorenchemengine.git"