dselivanov / text2vec

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
http://text2vec.org
Other
849 stars 135 forks source link

SWEM-concat Implementation in text2vec #331

Open raywyf opened 3 years ago

raywyf commented 3 years ago

Hello!

I'm currently using text2vec to create embeddings for a dataset of tweets. Since each document is pretty short, I want to implement a Simple Word Embedding Model (SWEM), specifically SWEM-concat, whereby I concatenate together the average of all word vectors in a document, and the result from max-pooling. This method is discussed in this paper.

I can get the document averages by normalizing the dtm object and then taking the dot product of that with the word vector object, but I'm struggling with how to get the max-pooling results. Any help would be much appreciated!

Thanks!