是否有成功将 Qwen2_VL 模型拆分并在 Triton 中分别部署的案例或经验分享

QwenLM / Qwen2-VL

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Apache License 2.0

3.23k stars 202 forks source link

是否有成功将 Qwen2_VL 模型拆分并在 Triton 中分别部署的案例或经验分享 #525

Open hello-carry opened 1 week ago

hello-carry commented 1 week ago

我的目标是：将Qwen2_VL拆分为 Vit_model 和 LLM_model 两个独立的模型，分别将它们部署到不同的 Triton 服务器中。使用 Triton 的 Ensemble 模式，将这两个模型串联起来，实现与原始 Qwen2_VL模型相同的功能。在推理过程中，先使用 Vit_model 处理图像，然后将生成的视觉特征传递给 LLM_model，最终生成文本输出。

whyiug commented 1 week ago

居然还有人有同样的需求。我用 vLLM 这样拆分部署过。

goen-kkk commented 1 week ago

我的目标是：将Qwen2_VL拆分为 Vit_model 和 LLM_model 两个独立的模型，分别将它们部署到不同的 Triton 服务器中。使用 Triton 的 Ensemble 模式，将这两个模型串联起来，实现与原始 Qwen2_VL模型相同的功能。在推理过程中，先使用 Vit_model 处理图像，然后将生成的视觉特征传递给 LLM_model，最终生成文本输出。

我也是，我想知道怎么通过Input_embeds生成完整句子

DakeQQ commented 4 days ago

欢迎参考此仓库Native-LLM-for-Android。它将视觉和文本组件拆分为多个ONNX模型，最终部署在Android设备上。您可以根据实时需要, 启用或禁用视觉功能。