LHRLAB / ChatKBQA

[ACL 2024] Official resources of "ChatKBQA: A Generate-then-Retrieve Framework for Knowledge Base Question Answering with Fine-tuned Large Language Models".
https://arxiv.org/abs/2310.08975
MIT License
225 stars 21 forks source link

torch.cuda.OutOfMemoryError: CUDA out of memory. #2

Open ganlinganlin opened 7 months ago

ganlinganlin commented 7 months ago

Hello, my friend During the training of LLAMA2-13b on an A30 GPU equipped with 24GB of video memory, I am facing an error concerning GPU memory allocation. Are there any feasible solutions or code modifications that can resolve this issue?

error:torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 23.50 GiB total capacity; 23.16 GiB already allocated; 2.81 MiB free; 23.16 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Thanks!

LHRLAB commented 7 months ago

Our recommended setup includes an A40 (48GB) GPU for training and inference. If the CUDA memory is insufficient, you can switch to smaller models like Llama-2-7B, or reduce the batch size, among other adjustments.