web_demo.py 运行需要10G显存。为啥 vllm 启动api模式运行，需要23G显存啊？

OpenBMB / MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Apache License 2.0

11.82k stars 829 forks source link

web_demo.py 运行需要10G显存。为啥 vllm 启动api模式运行，需要23G显存啊？ #76

Closed triumph closed 3 months ago

triumph commented 4 months ago

web_demo.py 运行需要10G显存。为啥 vllm 启动api模式运行，需要23G显存啊？ python web_demo.py --device cuda --dtype bf16

Vllm运行命令如下 /services/srv/MiniCPM-vllm/venv/bin/python -m vllm.entrypoints.openai.api_server --model /services/srv/MiniCPM-V/openbmb/MiniCPM-V-2/ --trust-remote-code

5fc1c797807f86d12cc37208620d731

iceflame89 commented 4 months ago

你好，vLLM有自己的显存管理机制，api模式初始化LLM时默认gpu_memory_utilization=0.9，所以用了22.5G，代码见：https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/llm.py#L93