Compare two Clusterings interactively

I see that the MTEB evaluates the clustering capability of embedding models using V-measure (to compare a k-means clustering vs ground-truth labels): https://github.com/embeddings-benchmark/mteb/blob/main/mteb/abstasks/AbsTaskClusteringFast.py

V-measure is a metric that evaluates the quality of clustering by comparing the cluster assignments to the true labels. It's the harmonic mean of two other metrics: homogeneity and completeness.

Homogeneity: Measures whether each cluster contains only members of a single class. Completeness: Measures whether all members of a given class are assigned to the same cluster.

The V-measure ranges from 0 to 1, where 1 indicates perfect clustering.

Some ideas from Claude for comparing clusters with variable sizes:

Calculate V-measure: We can still calculate the V-measure between the HDBSCAN clusters and the true labels. The interpretation would be slightly different:

If HDBSCAN finds fewer clusters than true labels, a high V-measure would indicate that the embeddings are grouping semantically similar categories together. If HDBSCAN finds more clusters than true labels, a high V-measure would suggest that the embeddings are capturing fine-grained semantic distinctions within categories.

Additional metrics: We could introduce additional metrics to complement the V-measure:

Adjusted Rand Index (ARI) or Adjusted Mutual Information (AMI), which are also suitable for comparing clusterings with different numbers of clusters Silhouette score to measure how well-separated the HDBSCAN clusters are A measure of how close the number of HDBSCAN clusters is to the number of true labels

enjalot / latent-scope

Compare two Clusterings interactively #61