elki-project / elki

ELKI Data Mining Toolkit
https://elki-project.github.io/
GNU Affero General Public License v3.0
780 stars 321 forks source link

How can I get outlier form outlierResult? #38

Closed rainfalj closed 6 years ago

rainfalj commented 6 years ago

I get outlierResult and scores, How can I judge a outlier? Scores: 1 1.017915661656192 2 1.0608605021777988 3 1.171651951509847 4 1.0359532383112164 5 0.9946463130241695 6 1.0021667682045214 7 1.0664994726755364 8 1.0163041670169992 9 1.0792733520499878 10 1.0654301407031426

kno10 commented 6 years ago

Closing: Please use the issue tracker only for code development, not for usage / science questions. For questions as yours, a statistics forum such as https://stats.stackexchange.com/ is more appropriate.

There is no general rule to binary classifier outliers solely based on the score. Some papers will suggest a threshold, such as 1.3, or even 3. But these values are very much data dependent.

There is plenty of literature on this problem, but no ultimate solution. There are classes in ELKI to scale outlier scores to probability values, but even this is quite heuristic.

As outlier detection is explorative data analysis, I suggest you sort the elements by score, and investigate the top k "most likely outlier" elements. Don't over-automate, the methods are not reliable enough for automation. Treat the result as "possible outliers", not as "definite outliers"!

Relevant literature:

Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek. Interpreting and Unifying Outlier Scores. In: Proceedings of the 11th SIAM International Conference on Data Mining (SDM), Mesa, AZ. 2011, 13–24

Erich Schubert, Remigius Wojdanowski, Arthur Zimek, and Hans-Peter Kriegel. On Evaluation of Outlier Rankings and Outlier Scores. In: Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA. 2012, 1047–1058