Skip Gram with Negative Sampling

dselivanov / text2vec

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.

http://text2vec.org

Other

850 stars 135 forks source link

Skip Gram with Negative Sampling #193

Open andland opened 7 years ago

andland commented 7 years ago

https://arxiv.org/pdf/1705.09755v1.pdf

I recently posted a paper to arXiv showing that word2vec's Skip Gram with Negative Sampling (SGNS) algorithm is a weighted logistic PCA. With that framework, SGNS can be trained using the same term-context matrix that is used for GloVe. The training could use the same AdaGrad procedure, only with different gradients and loss function and sampling all of the elements of the matrix instead of just the non-zeroes.

Is SGNS something you are interested in including in the text2vec package, or are you happy with GloVe?

Thanks

dselivanov commented 7 years ago

Thanks! Article looks very interesting. From my experience sgns and glove usually perform very similar. But would be interesting to compare in more detailed way.

andland commented 7 years ago

I agree they are largely similar, but an advantage of SGNS is that it does better for rarely occurring words. As the Swivel paper puts it: "GloVe is under-constrained: there is no penalty for placing unobserved but unrelated embeddings near to one another."

dselivanov commented 7 years ago

Yes, I remember this. But the clear advantage of GloVe is that complexity is O(nnz) instead of O(D^2). As I understand proposed SGNS and SGNS-LS also suffer from having complexity O(D^2).

andland commented 7 years ago

That is a downside. However, my intuition is that the number of parameter updates is more relevant than than number of epochs. For example, Figure 5 of the BPR paper. i.e. the algorithm may converge in a similar number of parameter updates as GloVe. This is mostly speculation though.