Closed triumph closed 3 months ago
web_demo.py 运行需要10G显存。为啥 vllm 启动api模式运行,需要23G显存啊? python web_demo.py --device cuda --dtype bf16
Vllm运行命令如下 /services/srv/MiniCPM-vllm/venv/bin/python -m vllm.entrypoints.openai.api_server --model /services/srv/MiniCPM-V/openbmb/MiniCPM-V-2/ --trust-remote-code
你好,vLLM有自己的显存管理机制,api模式初始化LLM时默认gpu_memory_utilization=0.9,所以用了22.5G,代码见:https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/llm.py#L93
web_demo.py 运行需要10G显存。为啥 vllm 启动api模式运行,需要23G显存啊? python web_demo.py --device cuda --dtype bf16
Vllm运行命令如下 /services/srv/MiniCPM-vllm/venv/bin/python -m vllm.entrypoints.openai.api_server --model /services/srv/MiniCPM-V/openbmb/MiniCPM-V-2/ --trust-remote-code