a1da4 / paper-survey

Summary of machine learning papers
32 stars 0 forks source link

Reading: Multimodal Word Distributions #208

Open a1da4 opened 3 years ago

a1da4 commented 3 years ago

0. Paper

@inproceedings{athiwaratkun-wilson-2017-multimodal, title = "Multimodal Word Distributions", author = "Athiwaratkun, Ben and Wilson, Andrew", booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", month = jul, year = "2017", address = "Vancouver, Canada", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/P17-1151", doi = "10.18653/v1/P17-1151", pages = "1645--1656", abstract = "Word embeddings provide point representations of words containing useful semantic information. We introduce multimodal word distributions formed from Gaussian mixtures, for multiple word meanings, entailment, and rich uncertainty information. To learn these distributions, we propose an energy-based max-margin objective. We show that the resulting approach captures uniquely expressive semantic information, and outperforms alternatives, such as word2vec skip-grams, and Gaussian embeddings, on benchmark datasets such as word similarity and entailment.", }

1. What is it?

They proposed a gaussian mixture model for multi-prototype word embeddings

スクリーンショット 2021-09-27 15 41 03

2. What is amazing compared to previous works?

Previous methods have two problems:

In this paper, they solve the upper problems by the gaussian mixture.

3. Where is the key to technologies and techniques

They define that the distribution of a target word w as follows:

given word w, context c, and negative sample c', the energy-based max-margin objective is below: スクリーンショット 2021-09-27 15 46 10 スクリーンショット 2021-09-27 15 46 30

4. How did evaluate it?

4.1 Nearest neighbors

From Table 1, their Gaussian mixture model (top) is able to express each meaning more effectively than the single Gaussian embedding (bottom). スクリーンショット 2021-09-27 15 55 12

4.2 Word similarity task

From Table 3, their model achieves higher performance than previous models. スクリーンショット 2021-09-27 15 57 59

4.3 Word similarity in context (#202)

From Table 4, their model achieves comparable results to the previous wordnet-based method (#204). スクリーンショット 2021-09-27 15 59 17

5. Is there a discussion?

5.1 Gaussian embeddings vs. Gaussian mixture

They show that their Gaussian mixture model outperforms the Gaussian embedding. They claim that modeling polysemous words with only one distribution makes a high variance in the single Gaussian embedding. スクリーンショット 2021-09-27 16 01 15

6. Which paper should read next?

a1da4 commented 3 years ago

209

update: Skip-Gram → fasttext