Open andland opened 7 years ago
Thanks! Article looks very interesting. From my experience sgns and glove usually perform very similar. But would be interesting to compare in more detailed way.
I agree they are largely similar, but an advantage of SGNS is that it does better for rarely occurring words. As the Swivel paper puts it: "GloVe is under-constrained: there is no penalty for placing unobserved but unrelated embeddings near to one another."
Yes, I remember this. But the clear advantage of GloVe is that complexity is O(nnz) instead of O(D^2). As I understand proposed SGNS and SGNS-LS also suffer from having complexity O(D^2).
That is a downside. However, my intuition is that the number of parameter updates is more relevant than than number of epochs. For example, Figure 5 of the BPR paper. i.e. the algorithm may converge in a similar number of parameter updates as GloVe. This is mostly speculation though.
https://arxiv.org/pdf/1705.09755v1.pdf
I recently posted a paper to arXiv showing that word2vec's Skip Gram with Negative Sampling (SGNS) algorithm is a weighted logistic PCA. With that framework, SGNS can be trained using the same term-context matrix that is used for GloVe. The training could use the same AdaGrad procedure, only with different gradients and loss function and sampling all of the elements of the matrix instead of just the non-zeroes.
Is SGNS something you are interested in including in the text2vec package, or are you happy with GloVe?
Thanks