Open AlexisTercero55 opened 8 months ago
Since de input are word vectors embedings of a text corpus sequence
L = \{X_i\in\mathbb{R}^{\text{dim}} : i = 1,\ldots,n\}
and that the positional encoding is a vector of same dimensions
LPE = \{Y_i\in\mathbb{R}^{\text{dim}} : Y_i = PE(X_i) \land i = 1,\ldots,n\}
they need to be summed in order to generate a new positionnal embedings of the initial corpus $L$
L_{PE} = \{Z_i\in\mathbb{R}^{\text{dim}} : Z_i = X_i+Y_i \land i = 1,\ldots,n\}
To this end, we add "positional encodings" to the input embeddings at the bottoms of the encoder and decoder stacks. The positional encodings have the same dimension dmodel as the embeddings, so that the two can be summed. (https://arxiv.org/abs/1706.03762)
Positional encoding
From paper Attention is all you need is required to implement this feature in order con contibute to the Transformers milestone.
Refferences
Related work items
8