a1da4 commented 3 years ago

0. Paper

@inproceedings{neelakantan-etal-2014-efficient, title = "Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space", author = "Neelakantan, Arvind and Shankar, Jeevan and Passos, Alexandre and McCallum, Andrew", booktitle = "Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ({EMNLP})", month = oct, year = "2014", address = "Doha, Qatar", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/D14-1113", doi = "10.3115/v1/D14-1113", pages = "1059--1069", }

1. What is it?

They proposed a Multi-Sense Skip-Gram (MSSG).

2. What is amazing compared to previous works?

3. Where is the key to technologies and techniques?

3.1 Multi-Sense Skip-Gram (MSSG)

(CBOW?)
- obtain a context word vector by averaging context word vectors
- select the nearest sense cluster of the target word using the context word vector
Skip-Gram with Negative sampling
- compute a probability of positive sample using the predicted sense vector
- compute a probability of negative sample using the predicted sense vector
Clustering
- update the sense cluster using the context vector

3.2 Non-Parametric MSSG

add a new sense vector when the similarities of all sense vectors and the context vector are lower than a threshold.

4. How did evaluate it?

4.1 Training Speed

Their models (MSSG, NP-MSSG) are faster than a previous baseline (#202).

4.2 Nearest Neighbors

Their models better represent ambiguity than the Skip-Gram and a strong baseline (#202).

4.3 Word Similarity in Context (#202)

5. Is there a discussion?

6. Which paper should read next?

a1da4 commented 3 years ago

208

Gaussian Mixture × Skip-Gram

a1da4 commented 3 years ago

209

Gaussian Mixture × fasttext

a1da4 / paper-survey

Reading: Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space #207

0. Paper

1. What is it?

2. What is amazing compared to previous works?

3. Where is the key to technologies and techniques?

3.1 Multi-Sense Skip-Gram (MSSG)

3.2 Non-Parametric MSSG

4. How did evaluate it?

4.1 Training Speed

4.2 Nearest Neighbors

4.3 Word Similarity in Context (#202)

5. Is there a discussion?

6. Which paper should read next?

208

209