Closed ronyadgar closed 3 months ago
if you use lora, can merge the model parameter and deploy the model. qlora I also want to know how to run with vllm, if anybody can resolve it?
I have the same problem.
+1
The main branch of vLLM has incorporated LoRA support for Qwen2 architecture/Qwen1.5 models. You can build from source now or wait for the upcoming release.
Please note that Qwen(1.0) will not be supported.
您好,我是陆泽,邮寄已收到,谢谢
after I ran finetune script, It save the adapter weight how can I run it with vLLM or TGI to run if efficiently and fast ?