NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.21k stars 910 forks source link

support Qwen2-VL #2183

Open junwenZhang opened 2 weeks ago

junwenZhang commented 2 weeks ago

System Info

qwen2-vl added new features of M-ROPE, please support it

Who can help?

No response

Information

Tasks

Reproduction

qwen2-vl open source model

Expected behavior

tensorrt-llm support

actual behavior

tensorrt-llm not support

additional notes

no

sunnyqgg commented 2 weeks ago

Hi, I'll do it.

scdotbox commented 7 hours ago

where to find PR files for qwen2-vl used by tensorrt-llm