Rectified Rotary Position Embeddings (ReRoPE)

Using ReRoPE, we can more effectively extend the context length of LLM without the need for fine-tuning.

Blog

https://kexue.fm/archives/9706 (Chinese)
https://kexue.fm/archives/9708 (Chinese)
https://normxu.github.io/Rethinking-Rotary-Position-Embedding-2/ (English by @NormXU)
https://normxu.github.io/Rethinking-Rotary-Position-Embedding-3/ (English by @NormXU)

Idea

Results

Calculated the loss on llama2-13b with samples_15k.jsonl:

Method	loss
RoPE-4k(original llama2-13b)	1.4967
RoPE-8k(original llama2-13b)	8.8615
NTK-RoPE-4k(not dynamic)	1.6081
NTK-RoPE-8k(not dynamic)	1.5417
NTK-RoPE-16k(not dynamic)	1.5163
ReRoPE-w1024-4k	1.4996
ReRoPE-w1024-8k	1.4267
ReRoPE-w1024-16k	1.4001

ReRoPE's performance at training length (4k) has hardly decreased, and it possesses the ideal property of "longer context, lower loss".

Usage

Dependency: transformers 4.31.0

Run python test.py to test chatting or run python eval_loss.py to calculate loss with llama2.

From here and here, we can see what modifications ReRoPE/Leaky ReRoPE has made compared to the original llama implementation.

Other

Triton Implementation of ReRoPE: https://gist.github.com/chu-tianxiang/4307937fd94b49c75b61a6967716bae9

Cite

@misc{rerope2023,
  title={Rectified Rotary Position Embeddings},
  author={Jianlin Su},
  year={2023},
  howpublished={\url{https://github.com/bojone/rerope}},
}

Communication

QQ discussion group: 67729435, for WeChat group, please add the robot WeChat ID spaces_ac_cn

bojone / rerope

readme