想问下Lmdeploy支持base model加多lora的部署方式么

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

https://lmdeploy.readthedocs.io/en/latest/

Apache License 2.0

4.74k stars 432 forks source link

想问下Lmdeploy支持base model加多lora的部署方式么 #2057

Closed will-wiki closed 3 months ago

will-wiki commented 4 months ago

https://docs.vllm.ai/en/latest/models/lora.html 这是vllm框架多lora部署方案，想问下lmdeploy有类似的方法么

lvhan028 commented 4 months ago

pytorch engine 支持多 lora 在文档 https://lmdeploy.readthedocs.io/en/latest/inference/pipeline.html 中，有 "An example for slora" 的例子

will-wiki commented 4 months ago

@lvhan028 感谢回复，想问下PytorchEngine跟TurbomindEngine有啥区别么，主要是性能上的区别吗，是否可以切换呢，这个好像是在offline推理侧，Serving LLM 支持么？另外想问下，现在的MLLM模型，像InternLM-XComposer2.5这种可以支持PytorchEngine Serving服务部署不

lvhan028 commented 4 months ago

区别：1）性能上 turbomind 更快些；2）支持的模型上 pytorch engine 更多；关于切换：

如果不指定推理引擎，优先选择turbomind engine，如果它支持了输入的模型的话。否则使用pytorch engine
如果指定了turbomind engine，但是它又不支持输入的模型，那么会使用pytorch engine
如果指定了pytorch engine，就使用pytorch engine

如果你指的 serving LLM 是想要serve lora，是支持的。有 --adapters 选项

InternLM-XComposer2.5 是在 turbomind engine中支持的。这里有文档：https://github.com/InternLM/lmdeploy/blob/main/docs/en/multi_modal/xcomposer2d5.md

will-wiki commented 4 months ago

@lvhan028 现在打算用serve lora部署InternLM-XComposer2.5 ，应该怎么部署呢，看了下只有pytorch engine才支持这个--adapters

https://github.com/InternLM/lmdeploy/blob/main/docs/en/multi_modal/xcomposer2d5.md 这里好像提到是用turbomind engine支持的，那就是说退化到用 pytorch engine部署，InternLM-XComposer2.5 就可以支持多lora部署了吧？也就是--backend {pytorch,turbomind}选择pytorch，牺牲性能满足要求

lvhan028 commented 4 months ago

也不是。PyTorchEngine并没有支持InternLM-XComposer2.5 所以，目前是无法满足你的需求的。

will-wiki commented 4 months ago

@lvhan028 好吧，那这个有在你们后续的计划内吗，比如turbomind engine支持多lora或者InternLM-XComposer2.5模型支持PyTorchEngine什么的

zhongtao93 commented 4 months ago

会有计划支持 internvl2 系列吗

zhongtao93 commented 4 months ago

我直接用

pytorch engine 支持多 lora 在文档 https://lmdeploy.readthedocs.io/en/latest/inference/pipeline.html 中，有 "An example for slora" 的例子

我用这个方法，将模型改成了internvl 2，没有奔溃，但输出结果为空

RunningLeon commented 4 months ago

我直接用

pytorch engine 支持多 lora 在文档 https://lmdeploy.readthedocs.io/en/latest/inference/pipeline.html 中，有 "An example for slora" 的例子

我用这个方法，将模型改成了internvl 2，没有奔溃，但输出结果为空

@zhongtao93 输入超长了，可以改大session_len ，参考 https://lmdeploy.readthedocs.io/en/latest/inference/vl_pipeline.html#set-context-window-size

github-actions[bot] commented 3 months ago

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

github-actions[bot] commented 3 months ago

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.