NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.61k stars 978 forks source link

Support for Cohere Command-R #1360

Closed tombolano closed 3 weeks ago

tombolano commented 7 months ago

Cohere released the model "Command-R", a multilingual model optimized for long context tasks such as retrieval augmented generation (RAG) and using external APIs and tools.

Release note: https://txt.cohere.com/command-r/ Weights: https://huggingface.co/CohereForAI/c4ai-command-r-v01

The evaluation results shown by Cohere are really good, it beats Mixtral, Llama2 70B, and ChatGPT 3.5 for RAG and tool usage tasks.

In the llama.cpp repository there is a discussion (https://github.com/ggerganov/llama.cpp/pull/6033) that provides some useful comments about its implementation.

EwoutH commented 7 months ago

They now also released a larger, 104B parameter model: C4AI Command R+

zhang001122 commented 6 months ago

Yeah, c4ai-command R+ is really nedd trt llm to support, and no more work for llama2, as the development of llm is super fast, Gays please hurry up

user-0a commented 5 months ago

Would also like support for this! Thank you for all of the hard work @ncomly-nvidia

imnoahcook commented 5 months ago

I would also like to request support for CommandR and CommandR+ plus, they are currently the best open source models.

aikitoria commented 5 months ago

Yes please. Command-R+ support is needed!

here4dadata commented 5 months ago

+1 for even more visibility

syuoni commented 3 weeks ago

Hi all,

The Command-R and Aya models have been supported on main branch. See: https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/commandr

I'm closing this issue. Thanks!