The current implementation uses complex arithmetic to implement the original paper, which has known issues with stability and precision. It's been suggested that xPos is a more stable and efficient way to achieve the same things by representing rotations using Euler's identity.
The current implementation uses complex arithmetic to implement the original paper, which has known issues with stability and precision. It's been suggested that xPos is a more stable and efficient way to achieve the same things by representing rotations using Euler's identity.
It would be nice if all constructors had an additional option to do arithmetic in real algebra using xPos (rotary positional embeddings), as described in this paper: https://arxiv.org/abs/2212.10554 and implemented here: https://github.com/microsoft/torchscale/blob/main/torchscale/component/xpos_relative_position.py
This may solve current issues with memory stability.