EpistasisLab / tpot

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
http://epistasislab.github.io/tpot/
GNU Lesser General Public License v3.0
9.61k stars 1.56k forks source link

AutoMLs Benchmark. Why TPOT so bad? #1100

Open Alex-Lekov opened 3 years ago

Alex-Lekov commented 3 years ago

I made Benchmark AutoML libs, and TPOT showed very poor results, even worse than the usual CatBoost with standard parameters! https://github.com/Alex-Lekov/AutoML-Benchmark/ I run the benchmark in docker - so you can easily reproduce it

here is the code from the benchmark:

automl = TPOTClassifier(max_time_mins=(TIME_LIMIT//60),
                        scoring='roc_auc', 
                        verbosity=1,
                        random_state=RANDOM_SEED, )

automl.fit(X_train, y_train,)                                             

try:
    predictions = automl.predict_proba(X_test)
except RuntimeError:
    predictions = automl.predict(X_test)

# TPOT make a different predictions format, depending on the algorithm :(

try:
     y_test_predict_proba = predictions[:,1]
except IndexError:
     y_test_predict_proba = predictions

y_test_predict = automl.predict(X_test)

print('AUC: ', roc_auc_score(y_test, y_test_predict_proba))

Is the code correct? (I do not adjust the advanced parameters, since AutoML, in theory, should pick everything up by itself, that's why it is AutoML)

 if we specify scoring = 'roc_auc', does it start optimizing for AUC for sure?

Please, tell me what am I doing wrong? Maybe I am using the library "incorrectly"? or is this a real result and the library is really that bad?

weixuanfu commented 3 years ago

I had a quick look on your benchmark. I saw the time limit is 1 hour but some of datasets has over 40k instances, so TPOT may not pass the initial generation(which is randomly generate pipelines) and should not optimize pipelines via genetic programing. I suggest increasing time limit to 1 day for each large size of datasets if n_jobs=1 or using parallel training with dask.