Thx for your work!
I use 8*a100 to finetune in lora way, but I still met the error 'cuda out of memory'. Is it normal? Do I truly need more resources to train lora?
The error is ike this:
[rank7]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 18.00 MiB. GPU 7 has a total capacity of 79.15 GiB of which 17.62 MiB is free. Including non-PyTorch memory, this process has 79.12 GiB memory in use. Of the allocated memory 77.43 GiB is allocated by PyTorch, and 219.12 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management
Thx for your work! I use 8*a100 to finetune in lora way, but I still met the error 'cuda out of memory'. Is it normal? Do I truly need more resources to train lora?
The error is ike this: [rank7]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 18.00 MiB. GPU 7 has a total capacity of 79.15 GiB of which 17.62 MiB is free. Including non-PyTorch memory, this process has 79.12 GiB memory in use. Of the allocated memory 77.43 GiB is allocated by PyTorch, and 219.12 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management
Looking forward to your reply!