Cannot reproduce pipeline results with sklearn pipeline

EpistasisLab / tpot

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

GNU Lesser General Public License v3.0

9.68k stars 1.57k forks source link

For my data, I got the best pipeline by running TPOT training using the following parameters:

from tpot import TPOTClassifier
tpot = TPOTClassifier(generations=5,
                      population_size=100, 
                      verbosity=2, 
                      n_jobs=-1,random_state=1)

The best pipeline was given as:

Best pipeline: MLPClassifier(GaussianNB(Binarizer(input_matrix, threshold=0.0)), alpha=0.001, learning_rate_init=0.001)
TPOTClassifier(generations=5, n_jobs=-1, random_state=1, verbosity=2)

The best CV score I achieved was 0.822

Using the ensemble provided above I trained an ensemble pipeline using sklearn as:

base_model = GaussianNB()

meta_model = MLPClassifier(random_state=1, 
                        learning_rate_init=0.001,
                        alpha=0.001)

ensemble = StackingClassifier(estimators=[('base_model', base_model), 
                                                     ('meta_model', meta_model)],
                                         final_estimator=meta_model,
                               n_jobs=-1)

The score I get from this is 0.79

Can you tell me why I getting different scores when all my parameters are same?

step1 = Binarizer(threshold=0.0) base_model = StackingEstimator(GaussianNB()) meta_model = MLPClassifier(random_state=1, learning_rate_init=0.001, alpha=0.001) ensemble = sklearn.pipeline.Pipeline(estimators=[('step1',step1), ('base_model', base_model), ('meta_model', meta_model)], final_estimator=meta_model, n_jobs=-1)

EpistasisLab / tpot

Cannot reproduce pipeline results with sklearn pipeline #1289