Figure 1: Contextual Position Encoding (CoPE). Standard position encoding methods such as Relative PE are based on token positions. In contrast, CoPE computes gate values conditioned on the context first, then uses that to assign positions to tokens using a cumulative sum. This allows positions to be contextualized, and represent the count of different units like words, verbs or sentences. CoPE operates on each attention head and so can attend to different position types on each.
In this example, attending to the last sentence using Relative PE is challenging, and the best it can do is a decaying attention (βrecency biasβ). CoPE can count the sentence endings and simply attend to position β0β.
Description
The numbers in this image is wrong, compared with PDF version. This image dilivers misinformation
(Optional:) Please add any files, screenshots, or other information here.
No response
(Required) What is this issue most closely related to? Select one.
Choose One
Internal issue ID
41724592-b5f9-40d8-934f-358ad355fe36
Paper URL
https://arxiv.org/html/2405.18719v1
Browser
Chrome/125.0.0.0
Device Type
Desktop