Performance optimization: memoize/cache fitted components and predictions during automl

alteryx / evalml

EvalML is an AutoML library written in python.

https://evalml.alteryx.com

BSD 3-Clause "New" or "Revised" License

736 stars 83 forks source link

Performance optimization: memoize/cache fitted components and predictions during automl #466

Open dsherry opened 4 years ago

dsherry commented 4 years ago

A feature evalml could support down the road is the ability to cache the output of each combination of components our pipelines have trained, so that if that component string is used again during automl, its fetched from the cache rather than recomputed.

Sklearn supports this functionality: see the memory parameter on Pipeline, and also this issue in their repo.

dsherry commented 4 years ago

RE discussion with Max today, this could be a good improvement to focus on soon. This would be particularly important for large datasets, where computations like missing value imputation or one-hot encoding could become costly.

dsherry commented 4 years ago

This would be helpful to have in the pipeline score code. We currently memoize predictions in that function so we don't recompute for each objective we want to evaluate a score for.

Also, I was browsing around for options and came across these two:

dsherry commented 4 years ago

This feature would've avoided #607 :)

dsherry commented 4 years ago

I'm considering doing this for classification pipelines' predict and predict_proba #648