Code for Multi Band Diffusion - Speech Part

facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

MIT License

20.59k stars 2.09k forks source link

Code for Multi Band Diffusion - Speech Part #205

Closed C00reNUT closed 1 year ago

C00reNUT commented 1 year ago

Hello,

thank you for providing this research to the public.

Do you plan to provide code for the Speech section from https://ai.honu.io/papers/mbd ?

I was searching for it in demos, but couldn't find it.

robinsrm commented 1 year ago

Hello,

The code for MultiBandDiffusion is the same for all modalities. The pre-trained models are different, you can get the pre-trained models for compression (speech and Music here https://ai.honu.io/papers/mbd) with:

mbd = MultiBandDiffusion.get_mbd_24khz(bw=3.0)  # 1.5 or 6.0

For the pre-trained model compatible with MusicGen:

mbd = MultiBandDiffusion.get_mbd_musicgen()

C00reNUT commented 1 year ago

thank you