Closed YSLIU627 closed 2 days ago
Hi, what command are you running? Sometimes I find that when I hit OOM with vllm it's due to other processes taking up GPU memory. It's a bit hard to debug these things via a github issue, but if you post an example command I can try to help.
Hi, I want to use the vllm during the evaluation. But when I set --vllm, it shows the OOM error. My GPU is A6000 and the model for evaluation is 7B. I can evaluate my model on mt-benchmark with vllm. I would appreciate it if you can help.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 446.00 MiB (GPU 0; 47.53 GiB total capacity; 31.11 GiB already allocated; 3.00 MiB free; 31.19 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF