facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
MIT License
20.17k stars 2.01k forks source link

AudioGen - Model generation speed? #421

Open jcrystal opened 4 months ago

jcrystal commented 4 months ago

Hi all,

All good getting AudioCraft up and rolling (Ubuntu 22.04.3 LTS). Pretty amazing stuff.

In just running this example, which generates three basic sound effects, my computer took almost an hour. Is that a fairly typical timeline for something like this?

If I stop mid-process, my traceback indicates it's getting held up at: model = AudioGen.get_pretrained('facebook/audiogen-medium')

I don't have a separate GPU card (i.e. no CUDA-enable) - could that be my issue?

All good with the answer "magic takes time", but want to make sure I'm experiencing something fairly normal, and that my best bet is probably a hardware upgrade. The PC I'm running this on isn't particularly current.

Thanks!

yukara-ikemiya commented 3 months ago

The get_pretrained method basically just loads model weights from HuggingFace Hub (online). So, if the execution is stuck in this line, it means model loading hasn't finished and audio generation hasn't started.

One-hour to load model is too long and not typical, therefore you should check the network speed I guess.