XLabs-AI / x-flux

Apache License 2.0
1.42k stars 98 forks source link

Cuda out of memory #24

Open dydxdt opened 1 month ago

dydxdt commented 1 month ago

Thx for your work! I use 8*a100 to finetune in lora way, but I still met the error 'cuda out of memory'. Is it normal? Do I truly need more resources to train lora?

The error is ike this: [rank7]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 18.00 MiB. GPU 7 has a total capacity of 79.15 GiB of which 17.62 MiB is free. Including non-PyTorch memory, this process has 79.12 GiB memory in use. Of the allocated memory 77.43 GiB is allocated by PyTorch, and 219.12 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management

Looking forward to your reply!

neonsecret commented 1 month ago

See https://github.com/XLabs-AI/x-flux/issues/12 worked for me on 2x 4090s with 48 gbs vram it's not a matter of vram but of using correct config