A study that made Self-Attention more efficient. Combining three type attention: random, peripheral only, and full (with only some tokens). They showed that in many NLP tasks it is SOTA and theoretically an approximation of s2s and Turing completeness.
TL;DR
A study that made Self-Attention more efficient. Combining three type attention: random, peripheral only, and full (with only some tokens). They showed that in many NLP tasks it is SOTA and theoretically an approximation of s2s and Turing completeness.
Why it matters:
Paper URL
https://arxiv.org/abs/2007.14062
Submission Dates(yyyy/mm/dd)
Authors and institutions
Methods
Results
Comments