EpistasisLab / tpot

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
http://epistasislab.github.io/tpot/
GNU Lesser General Public License v3.0
9.57k stars 1.55k forks source link

Question : Can multiple models be saved /exported from one Tpot optimization #1317

Closed kirane61 closed 10 months ago

kirane61 commented 10 months ago

Hello all, I am new to Tpot, can we save top 5 models or every model that tpot iterated model as sklearn model pickle file? so that I don't need to fit again and directly use it to score?

perib commented 10 months ago

In TPOT1 you can access the top model with est.fittedpipeline that is fit to the entire training set

You can also access the Pareto front (of scores vs number of nodes) models in est.pareto_front_fittedpipelines . This is a dictionary where the values are the actual pipelines that were fit to the entire training dataset.

From the documentation

fittedpipeline: scikit-learn Pipeline object The best pipeline that TPOT discovered during the pipeline optimization process, fitted on the entire training dataset. pareto_front_fittedpipelines: Python dictionary Dictionary containing the all pipelines on the TPOT Pareto front, where the key is the string representation of the pipeline and the value is the corresponding pipeline fitted on the entire training dataset.

The TPOT Pareto front provides a trade-off between pipeline complexity (i.e., the number of steps in the pipeline) and the predictive performance of the pipeline.

Note: pareto_front_fittedpipelines is only available when verbosity=3.

The est.evaluatedindividuals contains all the string representations of the individuals but not the actual pipeline. There isn't a built-in function to convert from the string to the actual pipeline... But this issue #516 has instructions for doing the conversion.

We also have an alpha version of TPOT 2, our next version, that seeks to address issues like these. In TPOT 2, we provide a pandas data frame that contains all evaluated individuals. Example here