automl / auto-sklearn

Automated Machine Learning with scikit-learn
https://automl.github.io/auto-sklearn
BSD 3-Clause "New" or "Revised" License
7.61k stars 1.28k forks source link

Feature Request: AutoSklearnOutlierDetector #578

Open Y-oHr-N opened 5 years ago

Y-oHr-N commented 5 years ago

Hello,

scikit-learn 0.20 provides more consistent outlier detection API. https://speakerdeck.com/albertcthomas/anomaly-detection-in-scikit-learn-ongoing-work-and-future-developments

So I want an estimator that fits all outlier detection models like AutoSklearnClassifier.

Thank you.

mfeurer commented 5 years ago

Just for clarification, do you think that these should be part of the pipeline tuned by Auto-sklearn or that there should be a standalone mode AutoSklearnOutlierDetector?

According to the title you want the second thing. From my understanding, this is an unsupervised learning problem. The central assumption in Auto-sklearn is that there as a loss function which can be used to tune the hyperparameters. What would such a loss function look like for outlier detection?

Y-oHr-N commented 5 years ago

Thank you for your reply. As far as I know, threre are two metrics for outlier function.

One is the square of the geometric mean of precision and recall.

outliers - Metrics for one-class classification - Cross Validated https://stats.stackexchange.com/questions/192530/metrics-for-one-class-classification Lee, W. S, and Liu, B., "Learning with positive and unlabeled examples using weighted Logistic Regression," In Proceedings of ICML, pp. 448-455, 2003. https://www.aaai.org/Papers/ICML/2003/ICML03-060.pdf

The other is the area under the Mass-Volume curve.

Goix, N., "How to evaluate the quality of unsupervised anomaly detection algorithms?" In ICML Anomaly Detection Workshop, 2016. https://arxiv.org/pdf/1607.01152.pdf Thomas, A., Clémençon, S., Feuillard, V., and Gramfort, A., "Learning hyperparameters for unsupervised anomaly detection," In ICML Anomaly Detection Workshop, 2016. https://github.com/albertcthomas/anomaly_tuning

I implemented two scikit-learn compatible metrics. https://github.com/HazureChi/kenchi/blob/master/kenchi/metrics.py

mfeurer commented 5 years ago

I'm afraid that I won't have the time to implement something here. Also, I think this is somewhat out of scope for Auto-sklearn if the metrics are not in scikit-learn yet.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs for the next 7 days. Thank you for your contributions.

jmren168 commented 1 year ago

Hi @mfeurer,

Is it possible to create a customized one-class SVM as a two-class SVM, and then put it into AutoSklearnClassifier? What I'm trying to do is

  1. add a customized classifier (input: a one-class SVM, and X_train and pseudo_y_train)
  2. make a customized score if pseudo_y_train are all 0 (only one class), then the score is 1e-5; otherwise, give a higher socre if it classifies outliers correctly
  3. put the customized classifier and the customized score into AutoSklearnClassifier

Does it sound reasonable and workable?

Any comments are highly appreciated.

JM