Closed arsalan993 closed 5 years ago
Hi, unfortunately we haven't implemented that functionality no. Alexis
I actually just wrote some code to do that. It takes in a list of sentences and a corpus, and returns the closest sentence (by cosine distance) in the corpus for each sentence in the list.
import numpy as np
from tqdm import tqdm
from sklearn.metrics.pairwise import pairwise_distances
# Initialize InferSent model here
def get_closest(sentences, corpus, model):
model.build_vocab(sentences + corpus, tokenize=True)
sentenceVecs = model.encode(sentences, tokenize=True)
corpusVecs = model.encode(corpus, tokenize=True)
distances = pairwise_distances(sentenceVecs, corpusVecs, metric='cosine', n_jobs=-1)
closest = []
for i in tqdm(range(len(sentenceVecs))):
sentence_distances = np.array([distances[i, j] for j in range(len(sentenceVecs))])
closestIdx = np.argmin(sentence_distances)
closest.append(corpus[closestIdx])
return closest
Thanks @superMDguy !
In word embedding "Sklearn library" we have a function to extract similar words.. It just return returns similar words from corpus it is trained on.. In similar way using inferSent can we somehow get similar sentences from corpus. Thanks