[Feature Request] support YaRN request

kkr37 commented 11 months ago

Feature request Nous Research and EleutherAI have released the YaRN model, which comes in two versions with context sizes of 64k and 128k. This model utilizes RoFormer-style embeddings, distinguishing it from GPT-NeoX and GPT-J. It is built upon the foundation of the LLaMa 2 model, making it largely compatible with some minor adjustments required for optimal support.

Motivation The YaRN model's longer context length (up to 128k) is highly valuable for tasks involving extensive context, compared to the limited 4096 context length of the llama2 base model.

Other YaRN paper: YaRN: Efficient Context Window Extension of Large Language Models YaRN Code: YaRN Github

jesonxiang commented 1 month ago

not supported yet?

AdamzNV commented 1 month ago

As more and more new models enter the market, we have prepared comprehensive instructions for TRT-LLM developers on adapting to new models of interest. We encourage our community developers to expand the range of supported models, fostering an open ecosystem with rapid iterations.

Please try following these instructions and let us know if you encounter any issues during the adaptation process. We greatly appreciate your dedication.

nv-guomingz commented 2 weeks ago

hi do u still have further issue or question now? If not, we'll close it soon.

NVIDIA / TensorRT-LLM

[Feature Request] support YaRN request #792