Closed vsakovskaya closed 2 months ago
you can try #785 (work in progress, not guaranteed results yet) for reduction further in memory use but ultimately without that patch this likely requires fp8-quanto
@bghira thank you! unfortunately I still get the same error but when moving to device now
SimpleTuner/train.py", line 369, in main
text_encoder_2.to(accelerator.device, dtype=weight_dtype)
i'll need more context, the text encoders only consume 9GB VRAM.
unfortunately the new optimisers didn't really do a whole lot to fix the problem so your best bet is to get quanto going
I’m encountering a CUDA out-of-memory error while training a LoRA model using FLUX on my custom dataset. The issue occurs despite using an NVIDIA RTX 4090 with 24 GB of VRAM and 64 GB of system RAM.
Environment
Steps to reproduce
bash train.sh
Error log
CUDA out of memory. Tried to allocate 54.00 MiB. GPU 0 has a total capacity of 23.63 GiB of which 34.56 MiB is free. Process 1108671 has 558.00 MiB memory in use. Process 1114104 has 1.03 GiB memory in use. Including non-PyTorch memory, this process has 21.22 GiB memory in use. Of the allocated memory 20.73 GiB is allocated by PyTorch, and 35.86 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Additional information
Please see the
config/config.env
attachedFull log