a1da4 / paper-survey

Summary of machine learning papers
32 stars 0 forks source link

Reading: Effective Approaches to Attention-based Neural Machine Translation #58

Open a1da4 opened 4 years ago

a1da4 commented 4 years ago

0. Paper

@inproceedings{luong-etal-2015-effective, title = "Effective Approaches to Attention-based Neural Machine Translation", author = "Luong, Thang and Pham, Hieu and Manning, Christopher D.", booktitle = "Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing", month = sep, year = "2015", address = "Lisbon, Portugal", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/D15-1166", doi = "10.18653/v1/D15-1166", pages = "1412--1421", }

1. What is it?

They proposed an improved model of sequence to sequence (seq2seq) model with using simple attention mechanism.

2. What is amazing compared to previous studies?

Seq2seq with attention is proposed, but the authors proposed a simple and effective model.

3. Where is the key to technologies and techniques?

image

Previous seq2seq model uses only one hidden state from LSTM last source word input. This method encodes both a long sentence with many words and a short sentence with low words into same size vectors. Then, the authors used attention mechanism, which uses all hidden states from all positions LSTM.

Hidden state with attention is below; image

Where ct is a context vector calculated by weighted average using target hidden state and alignment weight (attention weight) at. at is calculated by all source hidden states and target hidden state. image

So this attention mechanism likes a dictionary. Attention = (Query, Key)Value

4. How did validate it?

WMT14 English-German dataset. image

5. Is there a discussion?

6. Which paper should read next?

Transformer

a1da4 commented 4 years ago

59 Transformer