Closed thomasahle closed 6 years ago
It is mostly correct, you could approximate the unsupervised task by duplicating sentences each time selecting a different label. One difference is that we use subsampling (online) on the target words so not every word would be selected as label. We also apply dropout when using ngrams. Finally, in your example you have some bigrams such as "some with" that shouldn't be there, so sent2vec handles the ngrams generation properly.
I'm trying to understand the exact difference between sent2vec and running fasttest-supervised on lines like this:
Reading your paper and code, I think you're holding out the word on the lhs, that you are trying to predict on the rhs. E.g. you run
where as fasttext-supervised would include the rhs word in each of those four calls. Is this correct, or am I missing other differences between the two systems?