EpistasisLab / tpot

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
http://epistasislab.github.io/tpot/
GNU Lesser General Public License v3.0
9.73k stars 1.57k forks source link

StackingEstimator Interpretation of Exported Pipeline #698

Closed cpereir1 closed 6 years ago

cpereir1 commented 6 years ago

Hi! I am ran the TPOT classifier as follows:

pipeline_optimizer = tpot.TPOTClassifier(warm_start=True, periodic_checkpoint_folder="C:\Users...", verbosity=3, max_eval_time_mins=20, config_dict='TPOT light')

On a training set of shape 1871, 18.

I obtained the following exported pipeline:

exported_pipeline = make_pipeline( make_union( StackingEstimator(estimator=make_pipeline( SelectPercentile(score_func=f_classif, percentile=42), StackingEstimator(estimator=KNeighborsClassifier(n_neighbors=35, p=1, weights="uniform")), GaussianNB() )), FunctionTransformer(copy) ), DecisionTreeClassifier(criterion="gini", max_depth=5, min_samples_leaf=1, min_samples_split=13) )

I am having doubts regarding what is the flow that is suggested. My interpretation is:

  1. Percentile features are calculated
  2. kNN classification occurs
  3. Both results are added as features to the dataset
  4. GaussianNB classifier is applied to the complete dataset

Is this correct?

Thanks for your help!

BR

weixuanfu commented 6 years ago

StackingEstimator is a meta-transformer for adding predictions and/or class probabilities as synthetic feature(s). Your interpretation is correct for the part below:

make_pipeline(
SelectPercentile(score_func=f_classif, percentile=42),
StackingEstimator(estimator=KNeighborsClassifier(n_neighbors=35, p=1, weights="uniform")),
GaussianNB()
)

After the predictions in this part should be added to input X as synthetic features and then pass to DecisionTreeClassifier.

cpereir1 commented 6 years ago

Thanks weixuanfu for your reply!