AlexisTercero55 / AI-Research

AI Data Science, research and development
MIT License
0 stars 0 forks source link

Research | Positional encoding #10

Open AlexisTercero55 opened 8 months ago

AlexisTercero55 commented 8 months ago

Positional encoding

From paper Attention is all you need is required to implement this feature in order con contibute to the Transformers milestone.

Refferences

Related work items

AlexisTercero55 commented 8 months ago

Place of positional encoding in transformers architecture

Image

output of positional encoding

Since de input are word vectors embedings of a text corpus sequence

L = \{X_i\in\mathbb{R}^{\text{dim}} : i = 1,\ldots,n\}

and that the positional encoding is a vector of same dimensions

LPE = \{Y_i\in\mathbb{R}^{\text{dim}} : Y_i = PE(X_i) \land i = 1,\ldots,n\}

they need to be summed in order to generate a new positionnal embedings of the initial corpus $L$

L_{PE} = \{Z_i\in\mathbb{R}^{\text{dim}} : Z_i = X_i+Y_i \land i = 1,\ldots,n\}

To this end, we add "positional encodings" to the input embeddings at the bottoms of the encoder and decoder stacks. The positional encodings have the same dimension dmodel as the embeddings, so that the two can be summed. (https://arxiv.org/abs/1706.03762)