单条样本推理可以不使用stream_infer吗

zhanghanweii commented 2 days ago

Checklist

[X] 1. I have searched related issues but cannot get the expected help.
[ ] 2. The bug has not been fixed in the latest version.

Describe the bug

案例代码都是使用stream_infer推理，但是单条样本推理我发现有decode的代码，输出都是id，请问单条样本的预测代码是什么呢

Reproduction

logits = self.generator.decode(input_ids)

Environment

环境没问题

Error traceback

No response

lvhan028 commented 2 days ago

可以使用 pipeline 的接口。参考文档在这里：https://lmdeploy.readthedocs.io/en/latest/get_started.html#offline-batch-inference

zhanghanweii commented 2 days ago

可以使用 pipeline 的接口。参考文档在这里：https://lmdeploy.readthedocs.io/en/latest/get_started.html#offline-batch-inference

谢谢，尝试了成功了，不过我遇到了一个很有趣的问题：我在输入例如： 1、Read this sentence aloud, this is input: Today is a sunny day. 2、ask this question, this is input: do you know who is jams harden? 时，运行速度非常快，大约在500ms左右，但是我在运行以下输入时，速度就很慢： 1、do you know who is jams harden? 耗时大概是2s，我不知道具体是什么原因，在vllm中也有类似问题，添加this is input: 之后，速度就会变快

输入格式会影响加速效果吗，具体怎么避免呢

lvhan028 commented 2 days ago

可以看下生成的token数量是不是变多了

zhanghanweii commented 2 days ago

可以看下生成的token数量是不是变多了

都是生成5到6个token，但是耗时甚至比不加速都要慢，vllm也有同样的情况

InternLM / lmdeploy