epfml / sent2vec

General purpose unsupervised sentence representations
Other
1.19k stars 256 forks source link

How is sentence embeddings calculated from the corpus using Sent2vec command? #86

Open tqx94 opened 4 years ago

tqx94 commented 4 years ago

Hi,

when using the sent2vec command, a model will be produced through the cbow model. According to the paper, sent2vec will average the words vectors based on the weights learned in the training of corpus phase. But how does cbow initialise and update the weights, and what are the n grams used? For instance, when training a wikipedia corpus, what goes under the hood to calculate the different weights and dimensions for the sentence- 'I ate my breakfast in the morning'? What are the unigrams and bigrams involved here to be averaged? how are the initialisation of weights done? what is the target/source word in the sentence above? Thanks