Open zhanghanweii opened 5 days ago
我打印了输出如下: EngineOutput(status=<ResponseType.FINISH: 2>, token_ids=[10994, 29991, 1128, 508, 306, 1371, 366, 29973, 29966, 29872, 19807, 29958, 33002, 584, 508, 366, 5649, 825, 319, 29902, 338, 29973, 33004, 33003, 584, 18585, 29991, 319, 29902, 15028, 363, 3012, 928, 616, 3159, 28286, 29892, 607, 338, 263, 1746, 310, 6601, 10466, 393, 8569, 267, 373, 4969, 13052, 296, 14884, 393, 508, 1348, 322, 1044, 763, 25618, 29889, 319, 29902, 6757, 526, 2221, 304, 5110, 515, 1009, 5177, 29892, 18720, 15038, 29892, 322, 1207, 1602, 12112, 2729, 373, 278, 848, 896, 7150, 29889, 319, 29902, 508, 367, 1304, 304, 4505, 4280, 29902, 29973, 33004, 33003, 584, 450, 1900, 982, 304, 5110, 1048, 319, 29902, 338, 304, 1303, 8277, 29892, 2125, 21888, 29892, 322, 14333, 378, 10662, 29889, 19814, 29892, 727, 526, 1784, 7395, 7788, 3625, 29892, 1316, 408, 25410, 29892, 12618, 29879, 29892, 322, 363, 6762, 29889, 739, 338, 884, 4100, 304, 7952, 701, 304, 2635, 411, 278, 9281, 2693, 1860, 297, 319, 29902, 15483, 29889, 9788, 29892, 372, 338, 4100, 304, 6944, 322, 7639, 411, 319, 29902], num_token=513, logprobs=None)
之前 stream_infer
返回 tuple,现在返回 EngineOutput
结构体。
你可以参考这里 https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/turbomind/chat.py#L127-L142
之前
stream_infer
返回 tuple,现在返回EngineOutput
结构体。你可以参考这里 https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/turbomind/chat.py#L127-L142
谢谢,我在测试的时候发现速度好像并没有比vllm快,我在for outputs in generator.stream_infer()前后添加start=datetime.now(),end=datetime.now(),最后的结果比vllm慢好几倍
之前
stream_infer
返回 tuple,现在返回EngineOutput
结构体。你可以参考这里 https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/turbomind/chat.py#L127-L142
[TM][INFO] ------------------------- step = 580 ------------------------- [TM][INFO] ------------------------- step = 590 ------------------------- [TM][INFO] ------------------------- step = 600 ------------------------- [TM][INFO] [Interrupt] slot = 0, id = 0 [TM][INFO] [forward] Request completed for 0 Hello! How can I help you? text generate cost 0:00:03.319546 耗时是3秒多,vllm只有700ms。我看每次运行都会经过[TM][INFO] ------------------------- step = *** -------------------------这个步骤,这部分目视都花费了几秒
怎么测试的呢?确定生成的token数量一样吗?lmdeploy的max_batch_size是多大呢?
Checklist
Describe the bug
Traceback (most recent call last): File "/mnt/data/data/user/zhanghanwei/LLM_model/SpeechGPT-main/speechgpt/src/infer/cli_infer-lmdeploy.py", line 269, in interact self.forward([prompt]) File "/mnt/data/data/user/zhanghanwei/LLM_model/SpeechGPT-main/speechgpt/src/infer/cli_infer-lmdeploy.py", line 186, in forward for a, b in enumerate(outputs): TypeError: 'EngineOutput' object is not iterable 'EngineOutput' object is not iterable
Reproduction
下面是我的代码,参考自(https://xujinzh.github.io/2024/01/13/ai-internlm-lmdeploy/index.html): for outputs in self.generator.stream_infer( session_id=0, input_ids=[input_ids]): res, tokens = outputs[0]
Environment
Error traceback
No response