InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
4.74k stars 432 forks source link

想问下Lmdeploy支持base model加多lora的部署方式么 #2057

Closed will-wiki closed 3 months ago

will-wiki commented 4 months ago

https://docs.vllm.ai/en/latest/models/lora.html 这是vllm框架多lora部署方案,想问下lmdeploy有类似的方法么

lvhan028 commented 4 months ago

pytorch engine 支持多 lora 在 文档 https://lmdeploy.readthedocs.io/en/latest/inference/pipeline.html 中,有 "An example for slora" 的例子

will-wiki commented 4 months ago

@lvhan028 感谢回复,想问下PytorchEngine跟TurbomindEngine有啥区别么,主要是性能上的区别吗,是否可以切换呢,这个好像是在offline推理侧,Serving LLM 支持么? 另外想问下,现在的MLLM模型,像InternLM-XComposer2.5这种可以支持PytorchEngine Serving服务部署不

lvhan028 commented 4 months ago

区别:1)性能上 turbomind 更快些;2)支持的模型上 pytorch engine 更多; 关于切换:

  1. 如果不指定推理引擎,优先选择turbomind engine,如果它支持了输入的模型的话。否则使用pytorch engine
  2. 如果指定了turbomind engine,但是它又不支持输入的模型,那么会使用pytorch engine
  3. 如果指定了pytorch engine,就使用pytorch engine

如果你指的 serving LLM 是想要serve lora,是支持的。有 --adapters 选项

InternLM-XComposer2.5 是在 turbomind engine中支持的。这里有文档:https://github.com/InternLM/lmdeploy/blob/main/docs/en/multi_modal/xcomposer2d5.md

will-wiki commented 4 months ago

@lvhan028 现在打算用serve lora部署InternLM-XComposer2.5 ,应该怎么部署呢,看了下只有pytorch engine才支持这个--adapters

image

https://github.com/InternLM/lmdeploy/blob/main/docs/en/multi_modal/xcomposer2d5.md 这里好像提到是用turbomind engine支持的,那就是说退化到用 pytorch engine部署,InternLM-XComposer2.5 就可以支持多lora部署了吧?也就是--backend {pytorch,turbomind}选择pytorch,牺牲性能满足要求

lvhan028 commented 4 months ago

也不是。PyTorchEngine并没有支持InternLM-XComposer2.5 所以,目前是无法满足你的需求的。

will-wiki commented 4 months ago

@lvhan028 好吧,那这个有在你们后续的计划内吗,比如turbomind engine支持多lora或者InternLM-XComposer2.5模型支持PyTorchEngine什么的

zhongtao93 commented 4 months ago

会有计划支持 internvl2 系列吗

zhongtao93 commented 4 months ago

我直接用

pytorch engine 支持多 lora 在 文档 https://lmdeploy.readthedocs.io/en/latest/inference/pipeline.html 中,有 "An example for slora" 的例子

我用这个方法,将模型改成了internvl 2,没有奔溃,但输出结果为空

image
RunningLeon commented 4 months ago

我直接用

pytorch engine 支持多 lora 在 文档 https://lmdeploy.readthedocs.io/en/latest/inference/pipeline.html 中,有 "An example for slora" 的例子

我用这个方法,将模型改成了internvl 2,没有奔溃,但输出结果为空 image

@zhongtao93 输入超长了,可以改大session_len ,参考 https://lmdeploy.readthedocs.io/en/latest/inference/vl_pipeline.html#set-context-window-size

github-actions[bot] commented 3 months ago

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

github-actions[bot] commented 3 months ago

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.