derrian-distro / LoRA_Easy_Training_Scripts

A UI made in Pyside6 to make training LoRA/LoCon and other LoRA type models in sd-scripts easy
GNU General Public License v3.0
998 stars 101 forks source link

PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True (help) #178

Closed DarqueLilly closed 5 months ago

DarqueLilly commented 5 months ago

I get this message ---- ( torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 90.00 MiB. GPU 0 has a total capacity of 8.00 GiB of which 0 bytes is free. Of the allocated memory 7.13 GiB is allocated by PyTorch, and 127.09 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

I don't want to use Gradient Checkpoint or Gradient Accumuation. How do I try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation? I cant find any documentation on this other than (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) which is useless to me

derrian-distro commented 5 months ago

According to the documentation, this will not help you in any way shape or form, as you will still be forced to use above your vram amount, you must use gradient checkpointing and gradient accumulation to stretch your vram, or take a hit in quality and train on unet only, full_fp16 or full_bf16, caching latents, and caching the TE (only while also training unet only).

DarqueLilly commented 5 months ago

According to the documentation, this will not help you in any way shape or form, as you will still be forced to use above your vram amount, you must use gradient checkpointing and gradient accumulation to stretch your vram, or take a hit in quality and train on unet only, full_fp16 or full_bf16, caching latents, and caching the TE (only while also training unet only).

Thank you for the swift replay!