MaartenGr / PolyFuzz

Fuzzy string matching, grouping, and evaluation.
https://maartengr.github.io/PolyFuzz/
MIT License
725 stars 68 forks source link

to calculate semantic siliartity between two words such as happy and sad #33

Open xinli2008 opened 2 years ago

xinli2008 commented 2 years ago

Hello, can this tool be used to calculate the semantic simliarity between two words such as "happy" and "sad"? @MaartenGr

MaartenGr commented 2 years ago

Yes! PolyFuzz is meant to find the distance between two sets of strings. Distance might mean string-distance, as in how many changes do you need to take to go from one string to another, or it might mean the distance in semantic similarity.

Semantic similarity is typically extracted using embedding techniques, such as Word2Vec, FastText, or the transformer models that have shown tremendous performance boosts. To use one of these techniques, you can follow along with the guide here.