Place of positional encoding in transformers architecture

output of positional encoding

Since de input are word vectors embedings of a text corpus sequence

L = \{X_i\in\mathbb{R}^{\text{dim}} : i = 1,\ldots,n\}

and that the positional encoding is a vector of same dimensions

LPE = \{Y_i\in\mathbb{R}^{\text{dim}} : Y_i = PE(X_i) \land i = 1,\ldots,n\}

they need to be summed in order to generate a new positionnal embedings of the initial corpus $L$

L_{PE} = \{Z_i\in\mathbb{R}^{\text{dim}} : Z_i = X_i+Y_i \land i = 1,\ldots,n\}

To this end, we add "positional encodings" to the input embeddings at the bottoms of the encoder and decoder stacks. The positional encodings have the same dimension dmodel as the embeddings, so that the two can be summed. (https://arxiv.org/abs/1706.03762)

AlexisTercero55 / AI-Research

Research | Positional encoding #10

Positional encoding

Refferences

Related work items

8

Place of positional encoding in transformers architecture

output of positional encoding