Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
MIT License
20.59k
stars
2.09k
forks
source link
What compute resources are required to fine-tune MusicGen? #215
I just attempted to fine-tune a MusicGen model with a custom dataset using
dora run solver=musicgen/musicgen_base_32khz model/lm/model_scale=small continue_from=//pretrained/facebook/musicgen-small conditioner=text2music dset=audio/my-dataset
Unfortunately it ran into torch.cuda.OutOfMemoryError: CUDA out of memory on a single H100 instance. As I attempt to procure a larger cluster, it would be really helpful to know how much memory and compute time is typical for training the small, medium, and large MusicGen models. Thanks!
This was because the default solver config has a batch size configured for 32 GPUs :) I changed that and am running into other issues, but I'll close this.
I just attempted to fine-tune a MusicGen model with a custom dataset using
Unfortunately it ran into
torch.cuda.OutOfMemoryError: CUDA out of memory
on a single H100 instance. As I attempt to procure a larger cluster, it would be really helpful to know how much memory and compute time is typical for training the small, medium, and large MusicGen models. Thanks!