TL;DR

A lightweight and efficient transformer, DeLighT (Deep and Light-weight Transformer), is proposed. The key point is block-wise scaling, where more parameters are used in deeper layers, and DExTra, an improved version of DeFINE that increases the number of dimensions once the channels are grouped together. higher accuracy can be achieved with a smaller computational complexity.

Why it matters:

Paper URL

https://arxiv.org/abs/2008.00623

Submission Dates(yyyy/mm/dd)

2020/08/03

Authors and institutions

Sachin Mehta, Marjan Ghazvininejad, Srinivasan Iyer, Luke Zettlemoyer, Hannaneh Hajishirzi

University of Washington
Facebook AI Research

AkiraTOSEI / ML_papers

T: Very Deep and Light-weight Transformer #38

TL;DR

Why it matters:

Paper URL

Submission Dates(yyyy/mm/dd)

Authors and institutions

Methods

Results

Comments