Closed Jamie-Stirling closed 11 months ago
The first attempt at implementation uses complex values to encode position, which is causes value instability as well as memory instability.
This PR provides an additional implementation in src/real/ using Microsoft's xPos (extrapolable positional encodings): https://github.com/syncdoth/RetNet/blob/main/xpos_relative_position.py
This should solve problems with instability while also providing more options for precision.
New files:
The first attempt at implementation uses complex values to encode position, which is causes value instability as well as memory instability.
This PR provides an additional implementation in src/real/ using Microsoft's xPos (extrapolable positional encodings): https://github.com/syncdoth/RetNet/blob/main/xpos_relative_position.py
This should solve problems with instability while also providing more options for precision.
New files: