A high Silhouette score (close to 1) indicates that data points within clusters are similar, and that the normal data points are well separated from the anomalous ones.
โ Calinski-Harabasz index:
Calinski-Harabasz Index measures the between-cluster dispersion against within-cluster dispersion. A higher score signifies better-defined clusters.
โ Davies-Bouldin index:
Davies-Bouldin Index measures the size of clusters against the average distance between clusters. A lower score signifies better-defined clusters.
โ Kolmogorov-Smirnov statistic:
It measures the maximum difference between the cumulative distribution functions of the normal and anomalous data points.
โ Precision at top-k:
The metric calculates the precision of the top-k anomalous data points using expert domain knowledge.
๐ญ ๐ก๐จ๐ฐ ๐๐จ ๐ฐ๐ ๐ค๐ง๐จ๐ฐ ๐ฐ๐ก๐ข๐๐ก ๐ฆ๐๐ญ๐ก๐จ๐ ๐ข๐ฌ ๐๐๐ญ๐ญ๐๐ซ? ๐๐ ๐๐จ๐งโ๐ญ ๐ก๐๐ฏ๐ ๐ฅ๐๐๐๐ฅ๐ฌ ๐ข๐ง ๐๐ง๐ฌ๐ฎ๐ฉ๐๐ซ๐ฏ๐ข๐ฌ๐๐ ๐๐๐๐ซ๐ง๐ข๐ง๐ , ๐๐จ ๐ ๐ซ๐จ๐ฎ๐ง๐ ๐ญ๐ซ๐ฎ๐ญ๐ก.
The answer lies in using evaluation metrics that can help us determine the quality of our algorithm.
๐ผ๐ง๐๐๐ฆ๐๐ฅ๐๐ ๐ ๐๐๐ฅ๐๐ ๐๐ค:
โ Silhouette score:
A high Silhouette score (close to 1) indicates that data points within clusters are similar, and that the normal data points are well separated from the anomalous ones.
โ Calinski-Harabasz index:
Calinski-Harabasz Index measures the between-cluster dispersion against within-cluster dispersion. A higher score signifies better-defined clusters.
โ Davies-Bouldin index:
Davies-Bouldin Index measures the size of clusters against the average distance between clusters. A lower score signifies better-defined clusters.
โ Kolmogorov-Smirnov statistic:
It measures the maximum difference between the cumulative distribution functions of the normal and anomalous data points.
โ Precision at top-k:
The metric calculates the precision of the top-k anomalous data points using expert domain knowledge.
https://towardsdatascience.com/7-evaluation-metrics-for-clustering-algorithms-bdc537ff54d2
https://towardsdatascience.com/three-performance-evaluation-metrics-of-clustering-when-ground-truth-labels-are-not-available-ee08cb3ff4fb
https://medium.datadriveninvestor.com/evaluation-metrics-for-clustering-96dcdbea437d
https://towardsdatascience.com/a-comprehensive-beginners-guide-to-the-diverse-field-of-anomaly-detection-8c818d153995