Open avdusen opened 6 years ago
For Question 1. The steps are:
For Question 2.
For now, TPOT does not provide this options. But:
One of my dev branch of TPOT called noCDF_noStacking has a option named
simple_pipeline
, which can disable bothStackingEstimator
andCombineDFs
ifsimple_pipeline=True
(e.g.TPOTClassifier(simple_pipeline=True)
). But it is noted that this dev branch is not fully tested yet. If you want to try TPOT withoutStackingEstimator
andFeatureUnion
, you may install this branch in your test environment via the command below:pip install --upgrade --no-deps --force-reinstall git+https://github.com/weixuanfu/tpot.git@noCDF_noStacking
Please check #152 for more details. We are working on a more advanced pipeline configuration option.
Weixuanfu thank you for your prompt answer.
You may want to add this explanation to the documents. Also, here is something to add to what I am sure is a large "to do" list: use Graphviz to print out a tree structure image of the best pipeline. This would make it easier for the user to understand the data flow in the pipeline.
I ran a short regression test with a small data set. Here is the TPOT input: tpot_optimizer = TPOTRegressor(generations=5, population_size=20, scoring='neg_median_absolute_error',cv=5, random_state=42, verbosity=2)
Here is the best pipeline output: Best pipeline: ExtraTreesRegressor(XGBRegressor(LassoLarsCV(PolynomialFeatures(RidgeCV(input_matrix), degree=2, include_bias=False, interaction_only=False), normalize=True), learning_rate=0.1, max_depth=2, min_child_weight=4, n_estimators=100, nthread=1, subsample=0.5), bootstrap=True, max_features=0.45, min_samples_leaf=6, min_samples_split=15, n_estimators=100)
Here is the relevant part of the exported python file: exported_pipeline = make_pipeline( StackingEstimator(estimator=RidgeCV()), PolynomialFeatures(degree=2, include_bias=False, interaction_only=False), StackingEstimator(estimator=LassoLarsCV(normalize=True)), StackingEstimator(estimator=XGBRegressor(learning_rate=0.1, max_depth=2, min_child_weight=4, n_estimators=100, nthread=1, subsample=0.5)), ExtraTreesRegressor(bootstrap=True, max_features=0.45, min_samples_leaf=6, min_samples_split=15, n_estimators=100) )
Question 1: Is the following interpretation of the order of steps used correct?
Question 2: Is it possible to turn off stacking?