This task is designed to provide na overview of the clustering metrics that will be used to evaluate the quality of the clustering.
The metric should provide insights into the cluster quality based on the general Journalism questions: who, what, where, and when. This will be measured based on the (1) content similarity, (2) identified named entities, and (3) the article's published date.
We found the silhouette score, Calinski-Harabasz index, and Davies-Bouldin index appropriate for use as clustering metrics, as we can evaluate the quality of clusters without ground truth labels.
This task is designed to provide na overview of the clustering metrics that will be used to evaluate the quality of the clustering.
The metric should provide insights into the cluster quality based on the general Journalism questions: who, what, where, and when. This will be measured based on the (1) content similarity, (2) identified named entities, and (3) the article's published date.