torch.cuda.OutOfMemoryError

iamblue commented 1 year ago

使用 13B 模型，並用以下指令：

CUDA_VISIBLE_DEVICES=1 python generate.py --model_path "decapoda-research/llama-13b-hf" --lora_path "Chinese-Vicuna/Chinese-Vicuna-lora-13b-belle-and-guanaco" --use_local 1

最終出現這樣錯誤：

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 68.00 MiB (GPU 0; 10.75 GiB total capacity; 10.17 GiB already allocated; 47.94 MiB free; 10.17 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

顯卡是使用 RTX 2080 11G

有設置過

export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:32

依然無用

也有至 generate.py 把 batch_size = 2 使用依然無效

有什麼建議嗎？

Facico commented 1 year ago

@iamblue 如果要在2080Ti上使用推理建議使用7b的模型，比如這個generate腳本中所示，13b的模型建議使用更大顯存的顯卡或者使用CPU進行推理（如果內存大小支持的話）

iamblue commented 1 year ago

@Facico 好的，用 cpu 測試下來，需要 memory 約近 54GB ，想問一下後續有機會優化記憶體空間嗎？

Facico commented 1 year ago

我們也支持使用gptq的量化（不過在量化的過程中需要比較大的顯存），由於用那個量化的方法目前效果不好，沒有將量化後的模型上傳，後續我們會關註這上面的問題。

Facico / Chinese-Vicuna

torch.cuda.OutOfMemoryError #29