AkiraTOSEI / ML_papers

ML_paper_summary(in Japanese)

4 stars 1 forks source link

Big Bird: Transformers for Longer Sequences #83

Open AkiraTOSEI opened 4 years ago

AkiraTOSEI commented 4 years ago

TL;DR

A study that made Self-Attention more efficient. Combining three type attention: random, peripheral only, and full (with only some tokens). They showed that in many NLP tasks it is SOTA and theoretically an approximation of s2s and Turing completeness.

Why it matters:

Paper URL

https://arxiv.org/abs/2007.14062

Submission Dates(yyyy/mm/dd)

Authors and institutions

Methods

Results

Comments