E3-JSI / dataset-OG2021

The project used to create the 2021 Tokyo Olympics data set
BSD 2-Clause "Simplified" License
2 stars 2 forks source link

Overview of clustering algorithms #3

Closed eriknovak closed 4 months ago

eriknovak commented 2 years ago

This task provides an overview of the different clustering algorithms that can be used to cluster the news articles.

The clustering algorithms and approaches should take into consideration that:

Take into consideration that it is OK if there are multiple clusters that are about the same event. Our goal is to avoid having clusters with articles covering multiple events.

austirol commented 2 years ago

We propose to use HDBSCAN because it does not require the number of clusters and recognizes outliers. We chose HDBSCAN for its efficiency over DBSCAN and OPTICS.