epfml / sent2vec

General purpose unsupervised sentence representations
Other
1.19k stars 256 forks source link

Whats the difference between nnSent and get_sentence_embeddings_from_pre-trained_models? #82

Closed tqx94 closed 4 years ago

tqx94 commented 5 years ago

Hi,

If I want to find sentences related to sentences Im querying, which command should I use? NNsent./fasttext nnSent model.bin corpora [k] directly or get_sentence_embeddings in https://github.com/epfml/sent2vec/blob/master/get_sentence_embeddings_from_pre-trained_models.ipynb and then manually find the cosine similarity?

If i were to use nnsent command: ./fasttext nnSent model.bin corpora [k]:

Thanks!

mpagli commented 5 years ago

If I want to find sentences related to sentences Im querying, which command should I use?

Both options are fine.

for the corpora that I upload, do I need to preprocess the texts by using stanford jar to match the pretrained models?

Yes, now there is a simpler option than this jar, check this official python wrapper: https://github.com/stanfordnlp/stanfordnlp

Do i need to clean the input text to be queried too

Yes