Reading: Effective Approaches to Attention-based Neural Machine Translation

0. Paper

@inproceedings{luong-etal-2015-effective, title = "Effective Approaches to Attention-based Neural Machine Translation", author = "Luong, Thang and Pham, Hieu and Manning, Christopher D.", booktitle = "Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing", month = sep, year = "2015", address = "Lisbon, Portugal", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/D15-1166", doi = "10.18653/v1/D15-1166", pages = "1412--1421", }

1. What is it?

They proposed an improved model of sequence to sequence (seq2seq) model with using simple attention mechanism.

2. What is amazing compared to previous studies?

Seq2seq with attention is proposed, but the authors proposed a simple and effective model.

3. Where is the key to technologies and techniques?

Previous seq2seq model uses only one hidden state from LSTM last source word input. This method encodes both a long sentence with many words and a short sentence with low words into same size vectors. Then, the authors used attention mechanism, which uses all hidden states from all positions LSTM.

Hidden state with attention is below;

Where ct is a context vector calculated by weighted average using target hidden state and alignment weight (attention weight) at. at is calculated by all source hidden states and target hidden state.

So this attention mechanism likes a dictionary. Attention = (Query, Key)Value

Query: target hidden state in current position
Key: all source hidden states
Value: all source hidden states

4. How did validate it?

WMT14 English-German dataset.

5. Is there a discussion?

6. Which paper should read next?

Transformer

a1da4 / paper-survey