AkiraTOSEI / ML_papers

ML_paper_summary(in Japanese)
4 stars 1 forks source link

Big Bird: Transformers for Longer Sequences #83

Open AkiraTOSEI opened 4 years ago

AkiraTOSEI commented 4 years ago

TL;DR

A study that made Self-Attention more efficient. Combining three type attention: random, peripheral only, and full (with only some tokens). They showed that in many NLP tasks it is SOTA and theoretically an approximation of s2s and Turing completeness. image

Why it matters:

Paper URL

https://arxiv.org/abs/2007.14062

Submission Dates(yyyy/mm/dd)

Authors and institutions

Methods

Results

Comments