Open carat-keeeehun opened 3 months ago
are you running local or in the cloud ? Apparently there is something already stored in your gpu's memory.
BTW, the L40S is a very good gpu for training, better than the 6000 ada or regular L40. It's the next best thing after H100
PS: you might want to set up your alpha to 4 or 8, or even 16
Lora type needs to be FLUX1
Lora type needs to be FLUX1
this one is not too obvious, I missed it too at the beginning
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 18.00 MiB. GPU 0 has a total capacty of 44.53 GiB of which 15.25 MiB is free. Including non-PyTorch memory, this process has 44.51 GiB memory in use. Of the allocated memory 42.17 GiB is allocated by PyTorch, and 1.83 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Whenever I started flux.1 lora trainig, I met above OutOfMemory issues. I found that flux Lora training with flux Dev model requires NVIDIA RTX3000 or 4000 series GPU with at least 24GB of VRAM in Kohya-ss Github Discussion. But I have NVIDIA L40S GPU of which VRAM is 48GB(in real, about 46GB).
And I set Lora configuration below,
Is there any other essential config I need to know to train flux lora in sd3.flux branch of kohya_ss ?? I don't understand why I met this OOM issues with my GPU server.