OpenBMB / MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Apache License 2.0
11.82k stars 829 forks source link

web_demo.py 运行需要10G显存。为啥 vllm 启动api模式运行,需要23G显存啊? #76

Closed triumph closed 3 months ago

triumph commented 4 months ago

web_demo.py 运行需要10G显存。为啥 vllm 启动api模式运行,需要23G显存啊? python web_demo.py --device cuda --dtype bf16

Vllm运行命令如下 /services/srv/MiniCPM-vllm/venv/bin/python -m vllm.entrypoints.openai.api_server --model /services/srv/MiniCPM-V/openbmb/MiniCPM-V-2/ --trust-remote-code

5fc1c797807f86d12cc37208620d731

iceflame89 commented 4 months ago

你好,vLLM有自己的显存管理机制,api模式初始化LLM时默认gpu_memory_utilization=0.9,所以用了22.5G,代码见:https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/llm.py#L93