EpistasisLab / tpot

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
http://epistasislab.github.io/tpot/
GNU Lesser General Public License v3.0
9.66k stars 1.56k forks source link

Look into adding non-linear dimensionality reduction preprocessors #298

Open rhiever opened 7 years ago

rhiever commented 7 years ago

Most of the feature preprocessors that we use are based on linear methods. We should look into adding non-linear dimensionality reduction preprocessors, such as:

kadarakos commented 7 years ago

This paper http://bit.ly/2gbuKey suggests that non-linear dimensionality reduction techniques fail to improve upon PCA in natural data sets; it actually has KernelPCA in the comparison. Since PCA is super fast compared to KernelPCA and other non-linear techniques I would vote against including non-linear stuff.

rhiever commented 7 years ago

That's very surprising. I bet we could find some examples where those findings don't hold.

saddy001 commented 6 years ago

To get a first insight one could include non-linear preprocessors, run TPOT for 2-3 standard datasets and look into the best pipelines, if any of those preprocessors were included.