Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin
Year of Publication
2017
Summary
This paper is the first attempt to build a sequence-to-sequence model entirely based on attention. It sets new state-of-the-art BLEU scores for WMT English to German (by 2 BLEU score) and English to French (by 0.7 BLEU score) translations. As it eradicates the need for sequential operations (size of the sequence in RNNs, number of layers in CNNs), it can be trained much faster by using parallelization.
Contributions of The Paper
Sets new SOTA scores for English to German and English to French translation
Proposed model is way more parallelizable, i.e., faster
Proposed model can deal with long term dependencies way better than RNN or CNN
They performed an extensive parameter search to show that —
using 8 attention heads is a sweet spot, increasing or decreasing hurts
Publisher
Advances in Neural Information Processing Systems 30 (NIPS 2017)
Link to The Paper
https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
Name of The Authors
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin
Year of Publication
2017
Summary
This paper is the first attempt to build a sequence-to-sequence model entirely based on attention. It sets new state-of-the-art BLEU scores for WMT English to German (by 2 BLEU score) and English to French (by 0.7 BLEU score) translations. As it eradicates the need for sequential operations (size of the sequence in RNNs, number of layers in CNNs), it can be trained much faster by using parallelization.
Contributions of The Paper
Comments
It would be a disrespect to put any comment on this paper.