Jamie-Stirling / RetNet

An implementation of "Retentive Network: A Successor to Transformer for Large Language Models"
MIT License
1.14k stars 99 forks source link

Real-valued implementation using xPos #6

Closed Jamie-Stirling closed 11 months ago

Jamie-Stirling commented 11 months ago

The first attempt at implementation uses complex values to encode position, which is causes value instability as well as memory instability.

This PR provides an additional implementation in src/real/ using Microsoft's xPos (extrapolable positional encodings): https://github.com/syncdoth/RetNet/blob/main/xpos_relative_position.py

This should solve problems with instability while also providing more options for precision.

New files: