InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
3.15k stars 281 forks source link

[Feature] 使用已经构建好的input使用lmdeploy来进行推理 #1760

Closed KooSung closed 1 week ago

KooSung commented 3 weeks ago

Motivation

请问如何使用已经构建好的input使用lmdeploy来进行推理? 背景是多模态输入自定义较多,期望能够直接使用构建好的inputs来进行推理加速。

        with torch.inference_mode():
            outputs = self.llm.generate(
                **inputs,
                streamer=None,
                max_new_tokens=1024,
                do_sample=True,
                temperature=0.8,
                top_p=0.8,
                eos_token_id=eos_token_id
            )

Related resources

No response

Additional context

No response

lvhan028 commented 3 weeks ago

LMDeploy不支持这个特性。 离线推理的方式,目前支持的输入情况可以在这份文档中找到:https://lmdeploy.readthedocs.io/en/latest/inference/vl_pipeline.html 在线推理方式,接口是对齐GPT-4V的:https://lmdeploy.readthedocs.io/en/latest/serving/api_server_vl.html

irexyc commented 3 weeks ago

如果用turbomind的接口的话,可以接收embedding形式的输入,应该能做到你说的情况。不过可能需要二次开发,具体使用可以参考pipeline调用turbomind backend的逻辑

https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/turbomind/turbomind.py#L793-L804

KooSung commented 3 weeks ago

@irexyc 谢谢🙏,我去了解一下

github-actions[bot] commented 1 week ago

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

github-actions[bot] commented 1 week ago

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.