EpistasisLab / tpot

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
http://epistasislab.github.io/tpot/
GNU Lesser General Public License v3.0
9.75k stars 1.57k forks source link

No refit #1164

Open jmrichardson opened 3 years ago

jmrichardson commented 3 years ago

Hi, I have a use case where I need to blend/average the models result over cross validation and not refit the entire dataset because my data is temporal with overlaps. I can control the overlap with creating the splits appropriately. However, if the best pipeline is re-fitted on the entire dataset it will re-introduce the overlaps. I was looking to see if tpot does a refit and these have conflicting answers:

https://github.com/EpistasisLab/tpot/issues/673 https://stackoverflow.com/questions/52008298/when-fitting-with-tpot-cv-is-the-fitted-pipeline-retrained-on-the-whole-datase

Thanks for your help

weixuanfu commented 3 years ago

The answer from Randy in stackoverflow is correct.