Open DrRaja opened 1 year ago
The manual pipeline is not exactly identical to the TPOT output. It is missing the Binarizer step.
Also, TPOT wraps internal classifiers in a StackingEstimator. This will pass through its inputs in addition to its predictions. (https://github.com/EpistasisLab/tpot/blob/master/tpot/builtins/stacking_estimator.py).
Going off memory, I believe this is what the TPOT output would be equivalent to:
step1 = Binarizer(threshold=0.0)
base_model = StackingEstimator(GaussianNB())
meta_model = MLPClassifier(random_state=1,
learning_rate_init=0.001,
alpha=0.001)
ensemble = sklearn.pipeline.Pipeline(estimators=[('step1',step1),
('base_model', base_model),
('meta_model', meta_model)],
final_estimator=meta_model,
n_jobs=-1)
The binarized transforms the data -> transformed data -> GaussianNB -> transformed data + predictions -> MLPclassifier
For my data, I got the best pipeline by running TPOT training using the following parameters:
The best pipeline was given as:
The best CV score I achieved was 0.822
Using the ensemble provided above I trained an ensemble pipeline using sklearn as:
The score I get from this is 0.79
Can you tell me why I getting different scores when all my parameters are same?