Jamie-Stirling / RetNet

An implementation of "Retentive Network: A Successor to Transformer for Large Language Models"
MIT License
1.14k stars 99 forks source link

Real-valued implementation using xPos #5

Closed Jamie-Stirling closed 11 months ago

Jamie-Stirling commented 11 months ago

The current implementation uses complex arithmetic to implement the original paper, which has known issues with stability and precision. It's been suggested that xPos is a more stable and efficient way to achieve the same things by representing rotations using Euler's identity.

It would be nice if all constructors had an additional option to do arithmetic in real algebra using xPos (rotary positional embeddings), as described in this paper: https://arxiv.org/abs/2212.10554 and implemented here: https://github.com/microsoft/torchscale/blob/main/torchscale/component/xpos_relative_position.py

This may solve current issues with memory stability.