Open dsherry opened 4 years ago
RE discussion with Max today, this could be a good improvement to focus on soon. This would be particularly important for large datasets, where computations like missing value imputation or one-hot encoding could become costly.
This would be helpful to have in the pipeline score
code. We currently memoize predictions in that function so we don't recompute for each objective we want to evaluate a score for.
Also, I was browsing around for options and came across these two:
This feature would've avoided #607 :)
I'm considering doing this for classification pipelines' predict
and predict_proba
#648
A feature evalml could support down the road is the ability to cache the output of each combination of components our pipelines have trained, so that if that component string is used again during automl, its fetched from the cache rather than recomputed.
Sklearn supports this functionality: see the
memory
parameter onPipeline
, and also this issue in their repo.