epfml / sent2vec

General purpose unsupervised sentence representations
Other
1.19k stars 256 forks source link

[not an issues] Any specific reason to use sent2vec embeddings over fastText embeddings ? #9

Closed spate141 closed 7 years ago

spate141 commented 7 years ago

I was looking at the paper and code. Great work first of all! I have a question in mind, as new fastText library can generate sentence embeddings by averaging word vectors, is there any comparison between fastText and sent2vec in any supervised/unsupervised task?

martinjaggi commented 7 years ago

the main difference is that our vectors are trained such that they can be added over variable length sentences/documents. this should make a large difference. we hope to have more comparisons soon. (the approach could potentially even be combined with subword vectors as you mentioned). in the meantime, feel free to run some benchmark directly, for example sentEval, or STS 2017.

guptaprkhr commented 7 years ago

Hi, The fastText sentence embeddings are very much similar to CBOW and Skipgram averaged-embeddings in performance as the word vectors they learn are on a CBOW-ish objective (while exploiting the use of morphology) while our word embeddings are trained for the purpose of getting proper sentence embeddings rather than the word embeddings themselves. As you can see in the paper, our embeddings significantly outperform the CBOW and Skipgram sentence embeddings.