facebookresearch / InferSent

InferSent sentence embeddings
Other
2.28k stars 471 forks source link

Any function for similar sentence? #89

Closed arsalan993 closed 5 years ago

arsalan993 commented 5 years ago

In word embedding "Sklearn library" we have a function to extract similar words.. It just return returns similar words from corpus it is trained on.. In similar way using inferSent can we somehow get similar sentences from corpus. Thanks

aconneau commented 5 years ago

Hi, unfortunately we haven't implemented that functionality no. Alexis

superMDguy commented 5 years ago

I actually just wrote some code to do that. It takes in a list of sentences and a corpus, and returns the closest sentence (by cosine distance) in the corpus for each sentence in the list.

import numpy as np
from tqdm import tqdm
from sklearn.metrics.pairwise import pairwise_distances

# Initialize InferSent model here

def get_closest(sentences, corpus, model):
    model.build_vocab(sentences + corpus, tokenize=True)

    sentenceVecs = model.encode(sentences, tokenize=True)
    corpusVecs = model.encode(corpus, tokenize=True)

    distances = pairwise_distances(sentenceVecs, corpusVecs, metric='cosine', n_jobs=-1)

    closest = []
    for i in tqdm(range(len(sentenceVecs))):
        sentence_distances = np.array([distances[i, j] for j in range(len(sentenceVecs))])
        closestIdx = np.argmin(sentence_distances)
        closest.append(corpus[closestIdx])
    return closest
aconneau commented 5 years ago

Thanks @superMDguy !