OutOfMemory issue when starting FLUX1 Lora training

carat-keeeehun commented 3 months ago

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 18.00 MiB. GPU 0 has a total capacty of 44.53 GiB of which 15.25 MiB is free. Including non-PyTorch memory, this process has 44.51 GiB memory in use. Of the allocated memory 42.17 GiB is allocated by PyTorch, and 1.83 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Whenever I started flux.1 lora trainig, I met above OutOfMemory issues. I found that flux Lora training with flux Dev model requires NVIDIA RTX3000 or 4000 series GPU with at least 24GB of VRAM in Kohya-ss Github Discussion. But I have NVIDIA L40S GPU of which VRAM is 48GB(in real, about 46GB).

And I set Lora configuration below,

LoRA type = standard
train batch size = 1
cache latents O
cache latents to disk O
max resolution = 512,512
network rank(dimension) = 8
network alpha = 1

Is there any other essential config I need to know to train flux lora in sd3.flux branch of kohya_ss ?? I don't understand why I met this OOM issues with my GPU server.

WarAnakin commented 3 months ago

are you running local or in the cloud ? Apparently there is something already stored in your gpu's memory.

BTW, the L40S is a very good gpu for training, better than the 6000 ada or regular L40. It's the next best thing after H100

PS: you might want to set up your alpha to 4 or 8, or even 16

eftSharptooth commented 3 months ago

Lora type needs to be FLUX1

WarAnakin commented 3 months ago

Lora type needs to be FLUX1

this one is not too obvious, I missed it too at the beginning

bmaltais / kohya_ss

OutOfMemory issue when starting FLUX1 Lora training #2724