explosion / sense2vec

🦆 Contextually-keyed word vectors
https://explosion.ai/blog/sense2vec-reloaded
MIT License
1.62k stars 240 forks source link

[Question] Why not Word2Vec? #154

Closed santoshbs closed 1 year ago

santoshbs commented 1 year ago

I was curious to know why only glove and fasttext are provided as options for training, and not word2vec (e.g., genism).

Relatedly, are there any accuracy benchmarks for sense2vec with training using three different approaches: glove, word2vec, vs. fasttext?

Thanks! S

adrianeboyd commented 1 year ago

As far as I know, these were just two commonly-used alternatives at the time and there's no reason you couldn't write a similar script for step 4 for gensim/word2vec.

This blog post describes a few comparisons that were made related to the vector training for the models retrained in 2019:

We also ran evaluations to decide the choice of algorithm (skipgram with negative samples vs GloVe, preferring the skipgrams output), the window size (5 vs. 15, preferring the narrower window), and character vs word features (choosing to disable FastText’s character ngrams).