A lightweight and efficient transformer, DeLighT (Deep and Light-weight Transformer), is proposed. The key point is block-wise scaling, where more parameters are used in deeper layers, and DExTra, an improved version of DeFINE that increases the number of dimensions once the channels are grouped together. higher accuracy can be achieved with a smaller computational complexity.
TL;DR
A lightweight and efficient transformer, DeLighT (Deep and Light-weight Transformer), is proposed. The key point is block-wise scaling, where more parameters are used in deeper layers, and DExTra, an improved version of DeFINE that increases the number of dimensions once the channels are grouped together. higher accuracy can be achieved with a smaller computational complexity.
Why it matters:
Paper URL
https://arxiv.org/abs/2008.00623
Submission Dates(yyyy/mm/dd)
2020/08/03
Authors and institutions
Sachin Mehta, Marjan Ghazvininejad, Srinivasan Iyer, Luke Zettlemoyer, Hannaneh Hajishirzi
Methods
Results
Comments