E3-JSI / dataset-OG2021

The project used to create the 2021 Tokyo Olympics data set
BSD 2-Clause "Simplified" License
2 stars 2 forks source link

Overview of clustering metrics #4

Closed eriknovak closed 4 months ago

eriknovak commented 2 years ago

This task is designed to provide na overview of the clustering metrics that will be used to evaluate the quality of the clustering.

The metric should provide insights into the cluster quality based on the general Journalism questions: who, what, where, and when. This will be measured based on the (1) content similarity, (2) identified named entities, and (3) the article's published date.

austirol commented 2 years ago

We found the silhouette score, Calinski-Harabasz index, and Davies-Bouldin index appropriate for use as clustering metrics, as we can evaluate the quality of clusters without ground truth labels.