I have a similar issue as #10 with OOM issues when trying to kickoff training, but I'm using a 2 x A6000 GPU setup which I thought should be adequate cause it has around 96 GB vram. Any idea how I can fix this? I'm getting the following error right now:
RuntimeError: CUDA out of memory. Tried to allocate 388.00 MiB (GPU 0; 47.54 GiB total capacity; 44.65 GiB already allocated; 388.88 MiB free; 45.01 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I have a similar issue as #10 with OOM issues when trying to kickoff training, but I'm using a 2 x A6000 GPU setup which I thought should be adequate cause it has around 96 GB vram. Any idea how I can fix this? I'm getting the following error right now:
RuntimeError: CUDA out of memory. Tried to allocate 388.00 MiB (GPU 0; 47.54 GiB total capacity; 44.65 GiB already allocated; 388.88 MiB free; 45.01 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Thank you!