MaartenGr / PolyFuzz

Fuzzy string matching, grouping, and evaluation.
https://maartengr.github.io/PolyFuzz/
MIT License
733 stars 67 forks source link

How to use 'glove' or 'pubmed' Embeddings in PolyFuzz #14

Closed abhibisht89 closed 3 years ago

abhibisht89 commented 3 years ago

i want to use 'glove' or 'pubmed' embedding in the PolyFuzz, i do try the below code however its not working:

fasttext = WordEmbeddings('glove') fasttext_matcher = Embeddings(fasttext, min_similarity=0)

model = PolyFuzz(fasttext_matcher).match(from_list, to_list) model.get_matches()

getting similarity as 0.0 for every thing

MaartenGr commented 3 years ago

For me, the following code is working correctly:

from polyfuzz import PolyFuzz
from polyfuzz.models import Embeddings
from flair.embeddings import WordEmbeddings

from_list = ["apple", "apples", "appl", "recal", "house", "similarity"]
to_list = ["apple", "apples", "mouse"]

glove = WordEmbeddings('glove')
matcher = Embeddings(glove, min_similarity=0)

models = PolyFuzz(matcher).match(from_list, to_list)

When I access the results with models.get_matches() the similarity scores seem to be working correctly. Have you tried updating to the newest version of PolyFuzz? Can you also run the code above to see if it's working for you?

abhibisht89 commented 3 years ago

@MaartenGr thanks for the response, yeah the above example is working. I am applying PolyFuzz to get the class wise similarity score. like from_list = ["apple", "apples", "appl", "recal", "house", "similarity"] to_list = ["apple"]

now give me the best match in "from_list" for "to_list"

however if in glove i did not find any embedding for "to_list" element , then I got 0.0 as similarity . Now got it ,

thanks

MaartenGr commented 3 years ago

Great, glad to hear that it works now!