Closed geoHeil closed 8 years ago
Could you please post the stdout after fit() function? The best pipeline should be printed out if fit() finished normally. Please also let us know which platform this codes ran on. More details will help us find the bugs causing this issue.
Also, please test codes above again without max_time_mins=10
. This parameters will override generation parameter and kill the fit() process in 10 minutes. If the process not get a best pipeline in the time limit, no fitted pipeline will be exported. I think it maybe the reason of this issue.
Update: the codes below may be used to reproduce the issue. I think we need add a friendly warning for using this parameters when running a time-consuming jobs. Sorry for the confusion.
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from tpot import TPOTClassifier
X, y = make_classification(n_samples=200, n_features=100,
n_informative=2, n_redundant=10,
random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.75, test_size=0.25)
tpot = TPOTClassifier(verbosity=2, max_time_mins=1)
tpot.fit(X_train, y_train)
tpot.export('tpot_pipe.py')
print(tpot.score(X_test, y_test))
When I remove the option it seems to run longer:
2016-11-08 18:05:18,713 INFO -- MainProcess connectionpool.py:214 -- Starting new HTTP connection (1): update_checker.bryceboe.com
Optimization Progress: 0%| | 7/10100 [10:51<201:40:59, 71.94s/pipeline]
Timeout during evaluation of pipeline #7. Skipping to the next pipeline.
Optimization Progress: 0%| | 14/10100 [14:08<88:17:27, 31.51s/pipeline]
Timeout during evaluation of pipeline #14. Skipping to the next pipeline.
Optimization Progress: 0%| | 16/10100 [17:45<182:17:02, 65.08s/pipeline]
Timeout during evaluation of pipeline #16. Skipping to the next pipeline.
Optimization Progress: 0%| | 19/10100 [25:09<374:49:26, 133.85s/pipeline]
Timeout during evaluation of pipeline #19. Skipping to the next pipeline.
Optimization Progress: 0%| | 22/10100 [25:23<136:50:45, 48.88s/pipeline]
I hope it works. How long does such a simulation usually run? 10100 seems to be a lot.
The default settings in TPOT on generation number and population size is population_size=100, generations=100
. So, with generation 0, the number of run is 100*(100+1) = 10100. I think the dataset you used should be very huge and many pipeline were skiped due to a time limit for evaluating a single pipleline (max_eval_time_mins = 5
in default) .
To estimate the speed of simulation in the dataset, I suggested reduce the generation number to ~10 and increase max_eval_time_mins
for your dataset by adding generations=10, max_eval_time_mins = 10
into TPOTClassifier
. Also I suggest this time-consuming process should run in a linux platform. A strange bug related to #300 was just found in MacOS on Macbook Pro and we will fix it in next version of TPOT. The current version of TPOT is stable in Linux (should be also all right in Windows.)
good point regarding linux - the python process just crashed on my macbook :(
Also, please test codes above again without
max_time_mins=10
. This parameters will override generation parameter and kill the fit() process in 10 minutes. If the process not get a best pipeline in the time limit, no fitted pipeline will be exported. I think it maybe the reason of this issue.Update: the codes below may be used to reproduce the issue. I think we need add a friendly warning for using this parameters when running a time-consuming jobs. Sorry for the confusion.
from sklearn.model_selection import train_test_split from sklearn.datasets import make_classification from tpot import TPOTClassifier X, y = make_classification(n_samples=200, n_features=100, n_informative=2, n_redundant=10, random_state=42) X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.75, test_size=0.25) tpot = TPOTClassifier(verbosity=2, max_time_mins=1) tpot.fit(X_train, y_train) tpot.export('tpot_pipe.py') print(tpot.score(X_test, y_test))
Thanks, it works with me.
pipeline does not seem to be fitted even though fit was called
Context of the issue
Even though fit is executed when I try to obtain the best result I get the error
The error