EpistasisLab / tpot

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
http://epistasislab.github.io/tpot/
GNU Lesser General Public License v3.0
9.76k stars 1.57k forks source link

Feature Selection using TPOT. #706

Open IamAVB opened 6 years ago

IamAVB commented 6 years ago

[provide general introduction to the issue and why it is relevant to this repository] Hi all, I am new to this tool and I am explored this tool to some extent and I got some questions/suggestion. I am not sure whether this is the right platform to ask my questions. I am using Genetic Programming to select the prominent features. I want to use TPOT for this task as, it's based on GP. I see that GP is used to select the best model to give best accuracy score by searching through different classification models and their hyper parameters. I want to know is there some configuration so that TPOT can just perform feature selection using GP?. What i mean is, create a generations having population set of features and do crossover, mutate features as whole to create new set of features and perform classification task using these set of features and use the classification accuracy as selected features fitness evaluation. Repeat the same procedure for specified generations/ stopping criterion. So in the end we get the final selected feature set. Let me know if this is implemented already in TPOT. Correct me if there is any miss understanding.

weixuanfu commented 6 years ago

We put a set of feature selector into TPOT default configuration. TPOT can randomly use those feature selectors (even combine the selected features in different selectors in tree structure) for optimizing accuracy. But so far, we are working/evaluating on a new template function to allow user to specify pipeline structure for feature selection.

IamAVB commented 6 years ago

Thanks for clarifying the doubt. So I understand that, TPOT provides 4 feature selector operators which are mentioned in 'TPOT MDR' built in configuration. Will I be able to add additional feature selection operators (as custom operators) which can be used to create more comprehensive feature set in population?.

weixuanfu commented 6 years ago

@IamAVB, sorry for overlooking this issue. Yes, you can add additional feature selection operators into configuration and use it via config_dict parameter.