Update TensorRT-LLM - Githubissues

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

https://nvidia.github.io/TensorRT-LLM

Apache License 2.0

8.71k stars 996 forks source link

Update TensorRT-LLM #2460

Closed kaiyux closed 3 days ago

kaiyux commented 3 days ago

Model Support
- Added support for EAGLE. Refer to examples/eagle/README.md.
- Added support for Qwen2-VL. Refer to the “Qwen2-VL” section of examples/multimodal/README.md.
- Added multimodal evaluation examples. Refer to examples/multimodal.
API
- Added the enable_chunked_prefill flag to the LlmArgs of the LLM API.
- Integrated BERT and RoBERTa models to the trtllm-build command.