QwenLM / Qwen2-VL

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Apache License 2.0
3.23k stars 202 forks source link

Qwen2-VL-2B-Instruct在vllm下推理速度慢,以下是测试结果,请问是否正常? #507

Open MaoXianXin opened 2 weeks ago

MaoXianXin commented 2 weeks ago

模型: Qwen2-VL-2B-Instruct 显卡: NVIDIA GA103M GeForce RTX 3080 Ti Mobile 推理后端: vllm 镜像: qwenllm/qwenvl:2-cu121

Category Metric Value
Inference Statistics Total samples processed 82
Average inference time 0.78 seconds
Min inference time 0.75 seconds
Max inference time 0.82 seconds
Token Usage Statistics Average completion tokens 21.02
Min completion tokens 21
Max completion tokens 22
Average prompt tokens 1379.98
Min prompt tokens 1307
Max prompt tokens 1483
Average total tokens 1401.00
Min total tokens 1328
Max total tokens 1504
Image Statistics Average image width 1230 pixels
Min image width 1230 pixels
Max image width 1230 pixels
Average image height 859.33 pixels
Min image height 821 pixels
Max image height 924 pixels
Average file size 0.09 MB
Min file size 0.08 MB
Max file size 0.13 MB

我想知道Qwen2VL多模态的推理速度看起来还是很慢的,这个是正常的吗?