SuperCowPowers / sageworks

SageWorks: An easy to use Python API for creating and deploying AWS SageMaker Models
https://www.supercowpowers.com
MIT License
37 stars 1 forks source link

Unsupervised Algorithms and Metrics #414

Open brifordwylie opened 4 months ago

brifordwylie commented 4 months ago

๐ญ ๐ก๐จ๐ฐ ๐๐จ ๐ฐ๐ž ๐ค๐ง๐จ๐ฐ ๐ฐ๐ก๐ข๐œ๐ก ๐ฆ๐ž๐ญ๐ก๐จ๐ ๐ข๐ฌ ๐›๐ž๐ญ๐ญ๐ž๐ซ? ๐–๐ž ๐๐จ๐งโ€™๐ญ ๐ก๐š๐ฏ๐ž ๐ฅ๐š๐›๐ž๐ฅ๐ฌ ๐ข๐ง ๐”๐ง๐ฌ๐ฎ๐ฉ๐ž๐ซ๐ฏ๐ข๐ฌ๐ž๐ ๐‹๐ž๐š๐ซ๐ง๐ข๐ง๐ , ๐๐จ ๐ ๐ซ๐จ๐ฎ๐ง๐ ๐ญ๐ซ๐ฎ๐ญ๐ก.

The answer lies in using evaluation metrics that can help us determine the quality of our algorithm.

๐”ผ๐•ง๐•’๐•๐•ฆ๐•’๐•ฅ๐•š๐• ๐•Ÿ ๐•„๐•–๐•ฅ๐•™๐• ๐••๐•ค:

โžŠ Silhouette score:

A high Silhouette score (close to 1) indicates that data points within clusters are similar, and that the normal data points are well separated from the anomalous ones.

โž‹ Calinski-Harabasz index:

Calinski-Harabasz Index measures the between-cluster dispersion against within-cluster dispersion. A higher score signifies better-defined clusters.

โžŒ Davies-Bouldin index:

Davies-Bouldin Index measures the size of clusters against the average distance between clusters. A lower score signifies better-defined clusters.

โž Kolmogorov-Smirnov statistic:

It measures the maximum difference between the cumulative distribution functions of the normal and anomalous data points.

โžŽ Precision at top-k:

The metric calculates the precision of the top-k anomalous data points using expert domain knowledge.

https://towardsdatascience.com/7-evaluation-metrics-for-clustering-algorithms-bdc537ff54d2

https://towardsdatascience.com/three-performance-evaluation-metrics-of-clustering-when-ground-truth-labels-are-not-available-ee08cb3ff4fb

https://medium.datadriveninvestor.com/evaluation-metrics-for-clustering-96dcdbea437d

https://towardsdatascience.com/a-comprehensive-beginners-guide-to-the-diverse-field-of-anomaly-detection-8c818d153995