support Qwen2-VL - Githubissues

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

https://nvidia.github.io/TensorRT-LLM

Apache License 2.0

8.21k stars 910 forks source link

support Qwen2-VL #2183

Open junwenZhang opened 2 weeks ago

junwenZhang commented 2 weeks ago

System Info

qwen2-vl added new features of M-ROPE, please support it

Who can help?

No response

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

qwen2-vl open source model

Expected behavior

tensorrt-llm support

actual behavior

tensorrt-llm not support

additional notes

sunnyqgg commented 2 weeks ago

Hi, I'll do it.

scdotbox commented 7 hours ago

where to find PR files for qwen2-vl used by tensorrt-llm