Qwen2-VL-2B-Instruct在vllm下推理速度慢，以下是测试结果，请问是否正常？ - Githubissues

QwenLM / Qwen2-VL

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Apache License 2.0

3.23k stars 202 forks source link

Qwen2-VL-2B-Instruct在vllm下推理速度慢，以下是测试结果，请问是否正常？ #507

Open MaoXianXin opened 2 weeks ago

MaoXianXin commented 2 weeks ago

模型: Qwen2-VL-2B-Instruct 显卡: NVIDIA GA103M GeForce RTX 3080 Ti Mobile 推理后端: vllm 镜像: qwenllm/qwenvl:2-cu121

Category	Metric	Value
Inference Statistics	Total samples processed	82
	Average inference time	0.78 seconds
	Min inference time	0.75 seconds
	Max inference time	0.82 seconds
Token Usage Statistics	Average completion tokens	21.02
	Min completion tokens	21
	Max completion tokens	22
	Average prompt tokens	1379.98
	Min prompt tokens	1307
	Max prompt tokens	1483
	Average total tokens	1401.00
	Min total tokens	1328
	Max total tokens	1504
Image Statistics	Average image width	1230 pixels
	Min image width	1230 pixels
	Max image width	1230 pixels
	Average image height	859.33 pixels
	Min image height	821 pixels
	Max image height	924 pixels
	Average file size	0.09 MB
	Min file size	0.08 MB
	Max file size	0.13 MB

我想知道Qwen2VL多模态的推理速度看起来还是很慢的，这个是正常的吗？