What compute resources are required to fine-tune MusicGen?

facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

MIT License

20.59k stars 2.09k forks source link

I just attempted to fine-tune a MusicGen model with a custom dataset using

dora run solver=musicgen/musicgen_base_32khz model/lm/model_scale=small continue_from=//pretrained/facebook/musicgen-small conditioner=text2music dset=audio/my-dataset

Unfortunately it ran into torch.cuda.OutOfMemoryError: CUDA out of memory on a single H100 instance. As I attempt to procure a larger cluster, it would be really helpful to know how much memory and compute time is typical for training the small, medium, and large MusicGen models. Thanks!

facebookresearch / audiocraft

What compute resources are required to fine-tune MusicGen? #215