NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.31k stars 931 forks source link

[Feature request] Cohere Family of Models (Command-R, Command-R-Plus, Aya23-8B, Aya23-35B, Aya101) #1657

Open user-0a opened 4 months ago

user-0a commented 4 months ago

Hello,

I am creating this issue for the purpose of requesting support for the Cohere family of models:

Command-R: https://huggingface.co/CohereForAI/c4ai-command-r-v01 Command-R-Plus: https://huggingface.co/CohereForAI/c4ai-command-r-plus Aya23-8B: https://huggingface.co/CohereForAI/aya-23-8B Aya23-35B: https://huggingface.co/CohereForAI/aya-23-35B Aya101: https://huggingface.co/CohereForAI/aya-101

Thank you

imnoahcook commented 4 months ago

I would also like to request support for CommandR and CommandR+ plus, they are currently the best open source models.

aikitoria commented 3 months ago

Yes please. Command-R+ support is needed!

here4dadata commented 3 months ago

+1

syuoni commented 1 month ago

Hi all, we've started investigating and implementing the cohere models. It is planned to be delivered in the 0.13 version.

aikitoria commented 1 month ago

Nice!

aikitoria commented 1 month ago

@syuoni Please also support these! https://huggingface.co/CohereForAI/c4ai-command-r-08-2024 https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024

syuoni commented 1 month ago

@syuoni Please also support these! https://huggingface.co/CohereForAI/c4ai-command-r-08-2024 https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024

These models share the same architecture CohereForCausalLM, so it's very likely they will be supported automatically once command-r is ready.

user-0a commented 3 weeks ago

Thank you!

salaki commented 1 day ago

Who knows how Cohere Family of Models are similar to other OSS model supported in TensorRTLLM?

salaki commented 1 day ago

@syuoni is the feature delayed? If I wanted to create a convertor from huggingface definition to tensorrtllm checkpoint myself, what document I should check?

aikitoria commented 1 day ago

It doesn't look like it is out, none of the recent commits mention it, but the version 0.13 was just released...

syuoni commented 16 hours ago

Hi all,

Yes. Cohere is postponed; it's not available in 0.13. Cohere has some structures quite different from LLaMA, e.g., qk_layernorm, so it took some extra time to align the accuracy.

The MR is ready and under review in our internal repo. I think it will be released soon. Thanks!