@inproceedings{neelakantan-etal-2014-efficient,
title = "Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space",
author = "Neelakantan, Arvind and
Shankar, Jeevan and
Passos, Alexandre and
McCallum, Andrew",
booktitle = "Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ({EMNLP})",
month = oct,
year = "2014",
address = "Doha, Qatar",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/D14-1113",
doi = "10.3115/v1/D14-1113",
pages = "1059--1069",
}
1. What is it?
They proposed a Multi-Sense Skip-Gram (MSSG).
2. What is amazing compared to previous works?
3. Where is the key to technologies and techniques?
3.1 Multi-Sense Skip-Gram (MSSG)
(CBOW?)
obtain a context word vector by averaging context word vectors
select the nearest sense cluster of the target word using the context word vector
Skip-Gram with Negative sampling
compute a probability of positive sample using the predicted sense vector
compute a probability of negative sample using the predicted sense vector
Clustering
update the sense cluster using the context vector
3.2 Non-Parametric MSSG
add a new sense vector when the similarities of all sense vectors and the context vector are lower than a threshold.
4. How did evaluate it?
4.1 Training Speed
Their models (MSSG, NP-MSSG) are faster than a previous baseline (#202).
4.2 Nearest Neighbors
Their models better represent ambiguity than the Skip-Gram and a strong baseline (#202).
0. Paper
@inproceedings{neelakantan-etal-2014-efficient, title = "Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space", author = "Neelakantan, Arvind and Shankar, Jeevan and Passos, Alexandre and McCallum, Andrew", booktitle = "Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ({EMNLP})", month = oct, year = "2014", address = "Doha, Qatar", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/D14-1113", doi = "10.3115/v1/D14-1113", pages = "1059--1069", }
1. What is it?
They proposed a Multi-Sense Skip-Gram (MSSG).
2. What is amazing compared to previous works?
3. Where is the key to technologies and techniques?
3.1 Multi-Sense Skip-Gram (MSSG)
3.2 Non-Parametric MSSG
add a new sense vector when the similarities of all sense vectors and the context vector are lower than a threshold.
4. How did evaluate it?
4.1 Training Speed
Their models (MSSG, NP-MSSG) are faster than a previous baseline (#202).
4.2 Nearest Neighbors
Their models better represent ambiguity than the Skip-Gram and a strong baseline (#202).
4.3 Word Similarity in Context (#202)
5. Is there a discussion?
6. Which paper should read next?