cristinae / WikiTailor

Your à-la-carte in-domain corpora extraction tool from Wikipedia
1 stars 0 forks source link

Computing the centroid of ESA vector representations per category #15

Closed albarron closed 7 years ago

albarron commented 8 years ago

In order to make the distance computation more efficient, we first obtain the centroid of each category. Afterwards, the distances can be computed against those centroids, instead of against each single article.

Issue: write a class that reads all the ESA representations within a Wikipedia edition and computes the centroid. In order to accelerate the process, the class will be called once per category, allowing for multiple process to be launched in a cluster in parallel.

albarron commented 7 years ago

The code is ready and available in cat.lump.ir.sim.ml.esa.experiments.B_EsaCategoryCentroidComputer.java