how to deploy finetuned model with vLLM

QwenLM / Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Apache License 2.0

12.54k stars 1.02k forks source link

how to deploy finetuned model with vLLM #758

Closed ronyadgar closed 3 months ago

ronyadgar commented 7 months ago

after I ran finetune script, It save the adapter weight how can I run it with vLLM or TGI to run if efficiently and fast ?

HelWireless commented 7 months ago

if you use lora, can merge the model parameter and deploy the model. qlora I also want to know how to run with vllm, if anybody can resolve it?

enymfc25173 commented 6 months ago

I have the same problem.

13661172102 commented 5 months ago

jklj077 commented 3 months ago

The main branch of vLLM has incorporated LoRA support for Qwen2 architecture/Qwen1.5 models. You can build from source now or wait for the upcoming release.

Please note that Qwen(1.0) will not be supported.

13661172102 commented 3 months ago

您好，我是陆泽，邮寄已收到，谢谢