flrngel / understanding-ai

personal repository

36 stars 6 forks source link

Convolution Sequence to Sequence Learning #1

Open flrngel opened 6 years ago

flrngel commented 6 years ago

Convolution Sequence to Sequence Learning

aka Fairseq

https://arxiv.org/pdf/1705.03122.pdf

3. A Convolutional Architecture

3.1. Position Embeddings

P for position vector

e for embedding

See also

"Positional encoding" from Attention is all you need

3.2. Convolutional Block Structure

(image from https://norman3.github.io/papers/docs/fairseq.html)

For image above, kernel width is 3, and convolution block stack size is 1

3.3. Multi-step Attention

using residual connection from g_i

attention for dot product z and d_i

3.4. Normalization Strategy

This is good for stabilize learning