MaartenGr / PolyFuzz

Fuzzy string matching, grouping, and evaluation.
https://maartengr.github.io/PolyFuzz/
MIT License
725 stars 68 forks source link

Clustering words based on similarity #54

Open issam9 opened 1 year ago

issam9 commented 1 year ago

Are there any plans to support clustering of words based on their similarity similar to the solution described here: https://stats.stackexchange.com/questions/123060/clustering-a-long-list-of-strings-words-into-similarity-groups

MaartenGr commented 1 year ago

Apologies for the late reply. The clustering is now done using single linkage on already matched words and does not use any metadata of the words itself. For many of the clustering algorithms out there, some metadata is necessary in the form of distance metrics or embeddings. As a result, this does not make the clustering you propose independent of the word similarity metric used.