Any function for similar sentence?

arsalan993 commented 5 years ago

In word embedding "Sklearn library" we have a function to extract similar words.. It just return returns similar words from corpus it is trained on.. In similar way using inferSent can we somehow get similar sentences from corpus. Thanks

aconneau commented 5 years ago

Hi, unfortunately we haven't implemented that functionality no. Alexis

superMDguy commented 5 years ago

I actually just wrote some code to do that. It takes in a list of sentences and a corpus, and returns the closest sentence (by cosine distance) in the corpus for each sentence in the list.

import numpy as np
from tqdm import tqdm
from sklearn.metrics.pairwise import pairwise_distances

# Initialize InferSent model here

def get_closest(sentences, corpus, model):
    model.build_vocab(sentences + corpus, tokenize=True)

    sentenceVecs = model.encode(sentences, tokenize=True)
    corpusVecs = model.encode(corpus, tokenize=True)

    distances = pairwise_distances(sentenceVecs, corpusVecs, metric='cosine', n_jobs=-1)

    closest = []
    for i in tqdm(range(len(sentenceVecs))):
        sentence_distances = np.array([distances[i, j] for j in range(len(sentenceVecs))])
        closestIdx = np.argmin(sentence_distances)
        closest.append(corpus[closestIdx])
    return closest

aconneau commented 5 years ago

Thanks @superMDguy !

facebookresearch / InferSent

Any function for similar sentence? #89