In order to make the distance computation more efficient, we first obtain the centroid of each category.
Afterwards, the distances can be computed against those centroids, instead of against each single article.
Issue: write a class that reads all the ESA representations within a Wikipedia edition and computes the centroid. In order to accelerate the process, the class will be called once per category, allowing for multiple process to be launched in a cluster in parallel.
In order to make the distance computation more efficient, we first obtain the centroid of each category. Afterwards, the distances can be computed against those centroids, instead of against each single article.
Issue: write a class that reads all the ESA representations within a Wikipedia edition and computes the centroid. In order to accelerate the process, the class will be called once per category, allowing for multiple process to be launched in a cluster in parallel.