datamllab / LongLM

[ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
https://arxiv.org/pdf/2401.01325.pdf
MIT License
548 stars 54 forks source link

Cohere command r #42

Open flaviusburca opened 3 weeks ago

flaviusburca commented 3 weeks ago

Is it possible to adapt this to cohere command-r models ?

Mooler0410 commented 3 weeks ago

Hi! If the model mentioned is CohereForAI/c4ai-command-r-v01, we believe it's possible. It uses typical RoPE. We quickly checked its implementation in Hugging Face's Transformers library. It looks pretty similar to Llama. You can refer to our Llama implementation to modify Cohere's code.

One thing that could matter is that CohereForAI/c4ai-command-r-v01 uses a very large RoPE theta—8,000,000.0, which is much larger than that of other models. This may cause the empirical rule for selecting good hyperparameters (group size, neighbor window) to fail. You may need to try several combinations to find a better one.