What is the technique used to extend the context size to 200,000 tokens?

OrionStarAI / Orion

Orion-14B is a family of models includes a 14B foundation LLM, and a series of models: a chat model, a long context model, a quantized model, a RAG fine-tuned model, and an Agent fine-tuned model. Orion-14B 系列模型包括一个具有140亿参数的多语言基座大模型以及一系列相关的衍生模型，包括对话模型，长文本模型，量化模型，RAG微调模型，Agent微调模型等。

Apache License 2.0

785 stars 57 forks source link

What is the technique used to extend the context size to 200,000 tokens? #2

Open aburkov opened 9 months ago

shihanmax commented 9 months ago

+1，请问预训练/sft阶段用到的最大上下文长度是多少，外推方式是？

chenxingphh commented 9 months ago

No description provided.

Thanks for your attention. We used a longer context for pre-training as well as some existing extrapolation methods.