a1da4 / paper-survey

Summary of machine learning papers
32 stars 0 forks source link

Reading: Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space #207

Open a1da4 opened 3 years ago

a1da4 commented 3 years ago

0. Paper

@inproceedings{neelakantan-etal-2014-efficient, title = "Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space", author = "Neelakantan, Arvind and Shankar, Jeevan and Passos, Alexandre and McCallum, Andrew", booktitle = "Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ({EMNLP})", month = oct, year = "2014", address = "Doha, Qatar", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/D14-1113", doi = "10.3115/v1/D14-1113", pages = "1059--1069", }

1. What is it?

They proposed a Multi-Sense Skip-Gram (MSSG).

2. What is amazing compared to previous works?

3. Where is the key to technologies and techniques?

3.1 Multi-Sense Skip-Gram (MSSG)

スクリーンショット 2021-09-24 0 51 32
  1. (CBOW?)
    • obtain a context word vector by averaging context word vectors
    • select the nearest sense cluster of the target word using the context word vector
  2. Skip-Gram with Negative sampling
    • compute a probability of positive sample using the predicted sense vector
    • compute a probability of negative sample using the predicted sense vector
  3. Clustering
    • update the sense cluster using the context vector

3.2 Non-Parametric MSSG

add a new sense vector when the similarities of all sense vectors and the context vector are lower than a threshold.

スクリーンショット 2021-09-24 1 04 19

4. How did evaluate it?

4.1 Training Speed

Their models (MSSG, NP-MSSG) are faster than a previous baseline (#202).

スクリーンショット 2021-09-24 1 04 38

4.2 Nearest Neighbors

Their models better represent ambiguity than the Skip-Gram and a strong baseline (#202).

スクリーンショット 2021-09-24 1 06 52 スクリーンショット 2021-09-24 1 08 36

4.3 Word Similarity in Context (#202)

スクリーンショット 2021-09-24 1 09 28

5. Is there a discussion?

6. Which paper should read next?

a1da4 commented 3 years ago

208

Gaussian Mixture × Skip-Gram

a1da4 commented 3 years ago

209

Gaussian Mixture × fasttext