AkiraTOSEI / ML_papers

ML_paper_summary(in Japanese)
5 stars 1 forks source link

SYNTHESIZER: Rethinking Self-Attention in Transformer Models #89

Open AkiraTOSEI opened 3 years ago

AkiraTOSEI commented 3 years ago

TL;DR

This study revisits the Self-Attention of a Transformer. Self-Attention uses dot-products to take interactions between tokens. In this study, they calculate the attention weight of each token independently or treat attention weight as a training parameter instead of dot-product, but they show the performance is competitive.

image

Why it matters:

Paper URL

https://arxiv.org/abs/2005.00743

Submission Dates(yyyy/mm/dd)

Authors and institutions

Methods

Results

Comments