How to train a medium or large model with limited GPU capacity?

facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

MIT License

20.18k stars 2.01k forks source link

How to train a medium or large model with limited GPU capacity? #370

Open ElizavetaSedova opened 6 months ago

ElizavetaSedova commented 6 months ago

I have cards of 24 GB each. The error appears when trying to train the medium model torch.cuda.OutOfMemoryError. Is there a way to train a medium or large model on my cards? I will be glad to any advice!

Saltb0xApps commented 6 months ago

Same issue here. I believe you have to use fsdp = true & autocast = false.

ElizavetaSedova commented 6 months ago

@Saltb0xApps Unfortunately this doesn't work for me when training a medium model. I tested this on a small model and noticed that the overall memory consumption did not change at all, with all other parameters being the same. I used the smallest batch size.

astralmedia commented 6 months ago

Try lowering the batch_size and/or epochs