how to avoid CUDA out of memory?

timotheecour4 commented 1 year ago

All of the training scripts specified in the README give errors like the following:

RuntimeError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 15.78 GiB total capacity; 10.52 GiB already allocated; 3.86 GiB free; 10.77 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

CUDA Version: 11.7 using 8x Tesla V100-SXM2 (with 16GB memory)

reducing --batch_size 32 didn't help passing --microbatch 1 didn't help

chavinlo commented 1 year ago

Lowest I managed to get was 32GB with batch size 1... but that was ages ago

timotheecour4 commented 1 year ago

Same, I was hoping there would be some flag configuration that would allow using a GPU (or multiple GPUs) with less memory

chavinlo commented 1 year ago

Same, I was hoping there would be some flag configuration that would allow using a GPU (or multiple GPUs) with less memory

Since it based on pytorch you might be able to implement some optimization like HF's accelerate or deepspeed for RAM offloading

Jack000 / glid-3-xl-stable

how to avoid CUDA out of memory? #21