Open issam9 opened 1 year ago
Apologies for the late reply. The clustering is now done using single linkage on already matched words and does not use any metadata of the words itself. For many of the clustering algorithms out there, some metadata is necessary in the form of distance metrics or embeddings. As a result, this does not make the clustering you propose independent of the word similarity metric used.
Are there any plans to support clustering of words based on their similarity similar to the solution described here: https://stats.stackexchange.com/questions/123060/clustering-a-long-list-of-strings-words-into-similarity-groups