EpistasisLab / tpot

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
http://epistasislab.github.io/tpot/
GNU Lesser General Public License v3.0
9.57k stars 1.55k forks source link

TPOTClassifier error for large data #1332

Open kiranellur opened 7 months ago

kiranellur commented 7 months ago

I am getting the following error

RuntimeError: There was an error in the TPOT optimization process. This could be because the data was not formatted properly, or because data for a regression problem was provided to the TPOTClassifier object. Please make sure you passed the data to TPOT correctly.

My Current best internal cv score is -inf . Even though the optimisation progress bar is displaying 75%

Even though it is working for smaller dataset , I am getting the erro for those having 200000 rows and 20 columns. I am currently using TPOT version 12.0 Is there any specific reason i am getting this?

Can you please help me to resolve this error. Thank you.

perib commented 7 months ago

I would recommend trying out TPOT2, the next version of TPOT. You can find it here: https://github.com/EpistasisLab/tpot2 This version is more stable with larger datasets compared to TPOT1. There is also a memory_limit parameter that you can use to set the maximum amount of RAM a single pipeline can take up.

For TPOT1: Perhaps it is simply running out of RAM and crashing?

Some suggestions: You could try to reduce RAM usage by lowering n_jobs. you could try editing the configuration dictionary to use smaller/faster models. One possibility is that fitting the pipeline is taking too long and timing out. You can increase the timeout by setting the parameter max_eval_time_mins .