Open austinpoulton opened 5 years ago
The FeatureUnion with identical FunctionTransformers in that pipeline means combining input matrix (related to this function). Since current TPOT randomly generates tree-based pipeline so some pipelines need this FeatureUnion
operator to combine transformed features or raw input features from two branches for tree. But sometimes, like the case in this issue, the FeatureUnion
just doubles feature spaces of raw input features.
My team uses TPOT for model selection as part of ML pipeline to help inform risk opinions for online transactions. We have been using the default classifier configuration provided by TPOT and have for the most part been getting good classifier performance. Our most recent model appeared to have consistent performance against the held-out-set and scoring distribution, but performed poorly serving prediction against out of sample data. We noticed that this pipeline included a FeatureUnion of twi copy FunctionTransformers. The input to the RF classifier was double the width.
True
All of our prior classifiers have been some form of Tree ensemble with not pre-processing steps.
We have been using TPOT 0.9.0 and the only change prior to our weird model was upgrading Pandas to 0.23.
TPOT configuration used:
The fitted TPOT pipeline in question was:
I don't really understand why such a preprocessing came about which seems nugator and produced very skewed scoring distribution for out-of-sample predictions.