NetEase-Media / grps_trtllm

【grps接入trtllm】通过GPRS+TensorRT-LLM+Tokenizers.cpp实现纯C++版高性能OpenAI LLM服务,支持chat和function call模式,支持ai agent,支持分布式多卡推理,支持多模态,支持gradio聊天界面。
Apache License 2.0
94 stars 3 forks source link

Qwen2-VL Model support #1

Open atlury opened 2 months ago

atlury commented 2 months ago

Hello

Will it be possible to include support for Qwen2-VL model? Thank you

zhaocc1106 commented 2 months ago

Hello

Will it be possible to include support for Qwen2-VL model? Thank you

It maybe difficult now because trtllm do not support M-ROPE(https://github.com/NVIDIA/TensorRT-LLM/issues/2183). I will follow up continuously.

atlury commented 2 months ago

Thank you!

Dimensionzw commented 4 days ago

Hello Will it be possible to include support for Qwen2-VL model? Thank you

It maybe difficult now because trtllm do not support M-ROPE(NVIDIA/TensorRT-LLM#2183). I will follow up continuously.

qwen2-vl and m-rope have been supported in the latest tensorrtllm master. Will grps consider supporting it? refer to: https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/multimodal

zhaocc1106 commented 3 days ago

Hello Will it be possible to include support for Qwen2-VL model? Thank you

It maybe difficult now because trtllm do not support M-ROPE(NVIDIA/TensorRT-LLM#2183). I will follow up continuously.

qwen2-vl and m-rope have been supported in the latest tensorrtllm master. Will grps consider supporting it? refer to: https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/multimodal

Implementing the Qwen2-VL processor in C++ is a bit complicated, but I’m trying to support it.