Open Muttermal opened 2 months ago
Hi, for vLLM, this is expected since it preallocates GPU memory as the "--gpu-memory-utilization" parameter which defaults to 0.9. For more information, please refer to the vLLM documentation.
您好,请问您解决这个问题了吗?我用4卡A800推理72B,也遇到了OOM。
Hi, I encountered an abnormal memory usage issue when deploying Qwen2-VL-7B-Instruct using vllm. My specific configuration is as follows: (Hi,我使用vllm部署Qwen2-VL-7B-Instruct遇到了显存占用异常的问题。我的具体配置如下)
Env(环境)
The installation method for transformers is (其中transformers的安装方式为)
Script(启动脚本)
GPU memory usage(显存占用)
Question(问题)
The service starts normally, but without any inference, the 7B model almost fully occupies the memory of four 48GB GPUs. Could you help me check where the issue might be?(服务启动正常,但在没有任何推理的情况下,7B的模型几乎占满了四张48G显存的显卡,请问可以帮我看看具体是哪里出了问题吗?)