EpistasisLab / tpot

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
http://epistasislab.github.io/tpot/
GNU Lesser General Public License v3.0
9.65k stars 1.56k forks source link

Feature Request: TPOTOutlierDetector #794

Open Y-oHr-N opened 5 years ago

Y-oHr-N commented 5 years ago

Hello,

scikit-learn 0.20 provides more consistent outlier detection API. https://speakerdeck.com/albertcthomas/anomaly-detection-in-scikit-learn-ongoing-work-and-future-developments

So I want an estimator that fits all outlier detection models like TPOTClassifier.

Thank you.

joanesplazaola commented 5 years ago

Hi, I have been checking the possibility of adding this to TPOT, but I would like to double check the procedure.

As in Novelty detection, the fit function should get both, the fit data (normal data) and a Xy validation set (mixture of normal and novel data with their labels), in order to score the validity of the evaluated models. Also, those labels of validation should be determined, as sklearn uses 1/-1, and many people uses 1/0. The training time of those models can also be a problem, mostly for the One Class SVM, as the time needed may exceed easily the expected limits.