EpistasisLab / tpot

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
http://epistasislab.github.io/tpot/
GNU Lesser General Public License v3.0
9.76k stars 1.57k forks source link

any recommendations to "Skipped pipeline #1 due to time out. Continuing to the next pipeline"? #627

Closed unnir closed 7 years ago

unnir commented 7 years ago

Hi guys,

thank you for the tpot, looks really cool and I want to test it, but getting this all the time:

Skipped pipeline #1 due to time out. Continuing to the next pipeline.

my code:

tpot = TPOTClassifier(generations=5, population_size=20, verbosity=3, random_state = 1111,max_eval_time_mins=13)
tpot.fit(X_train, y_train)

Do you have any recommendations how to avoid it?

weixuanfu commented 7 years ago

It seems that the trainning dataset are very huge or the first pipeline was too complex to cause the timeout warning. Increasing max_eval_time_mins may let the pipeline finish evaluation with this warning message or set verbosity = 2 to mute this warning message.

unnir commented 7 years ago

yep, the dataset is huge... So, there is no way to use it for the big data?

weixuanfu commented 7 years ago

For huge datasets, like half million of samples with thousands of features, I suggest to try ”TPOT light” configuration firstly with setting config_dict="TPOT light".

unnir commented 7 years ago

@weixuanfu thank you!

weixuanfu commented 7 years ago

I close this issue since there is no further comment in a while. Please feel free to re-open the issue (or comment further) if you have any more questions.

neel04 commented 3 years ago

@weixuanfu I had the same error, and have followed up on your modification, but does a million samples represent "Big Data"? the dataset in question would just be around 600Mb, and there are many of them bigger than that. Could you please explain why this happens in a bit more detail?

weixuanfu commented 3 years ago

1 million samples is big data for TPOT. Maybe you can only use TPOT cuML and also increase max_eval_time_mins to 20 or more.