Open jonathansantilli opened 6 years ago
Hi, thank you for the interesting idea here. So far, TPOT does not have a API to export the features selected in the best pipeline. We used feature selectors from scikit-learn and put them into scikit-learn Pipeline objects. I think you could access the selected features within the best Pipeline object. Please check the example in the link for scikit-learn Pipeline (see codes after "# getting the selected features chosen by anova_filter")
Hello @weixuanfu thanks for the reply, @rhiever have opened this issue https://github.com/rhiever/tpot/issues/629 that maybe could be oriented in this direction as well? I mean, sort of final and middle explanation about what is happening.
This will be a very useful enhancement.
I understood that TPOT involves hyperparameters, different classification models while creating pipelines using GP. But does it involve features too? means does mutation, crossover operators are applied on feature sets too while creating each new pipeline?.
Hello,
Thanks for the very nice library. I would love to work on a adding this as a feature to the API. Could you point me to where the selection/construction of features is happening ?
@dsleo Thank you. The feature selection/construction performs within the scikit-learn pipelines which are generated via GP in TPOT.
@weixuanfu any specific pointer in the code base to look at ?
I've started but I'm not very familiar with tpot and I haven't yet found where this is happening. I was hoping that _pop
attribute of the TPOTClassifier()
object would contain useful information about the population and hence the selected features as in here...Is it directly in eaMuPlusLambda
that I should try to modify pop
attributes to retain informations about the features ?
Any help would be greatly appreciated, thanks !
Currently, some statistics for evaluated pipelines are saved into evaluated_individuals_
via this function
Hello, thanks for TPOT, is really amazing!
I was looking this information in other questions, no found yet,
TPOT execute Feature selection, preprocessing and construction according to the documentation:
Is it possible to have access to those finally selected features? even better, is possible to have them after each generation?
After executing the export:
tpot.export('tpot_pipeline.py')
It will produce the code:
It means that separately we have to provide the data to the selected pipeline, instead of using the features used and generated by TPOT, is that Ok or am missing something?
Ideally (in my opinion), would be extraordinarily helpful for research propose, to know the best pipeline and features used after each generation. This could be used to analyze the progress improvement.
Thanks a lot for in advance.