Open sokol11 opened 4 years ago
TPOT has a built-in FeatureSetSelector (see this link) for helping user to select best feature set based on priori expert knowledge. Also you may limit ML hyperparameter space in config_dict
(like using a static estimator in the dictionary) and fix the pipeline with template
parameter.
Thank you @weixuanfu. Yes, I saw FeatureSetSelector but as I understood it is used to specify a static set of features as opposed to optimize the feature set composition. Am I correct in understanding that TPOT does not select features based on genetic optimization, i.e. binary coding all the features and using genetic programming to find the best feature subset? Thanks.
Your understanding is correct. TPOT cannot select features using GA without the static set of features based on priori expert knowledge. Maybe it is a good enhancement function to add into TPOT.
Understood. Thanks!
Alternatively, you may use template like Selector-Classifier
to indirectly select features by using GP to optimize best Selector
.
Hi. I wonder if it's possible to use TPOT's genetic programming framework to select the best subset of features. I already have an idea of what classification algorithm and parameters work well and I do not want to optimize the entire ML pipeline. I just would like to create a population of smaller feature sets out of ~1000 features I have and use cross-over/mutation to find the best set while using my static estimator for model evaluation. Can I do that with TPOT? If so, how might I go about doing it? Thanks!