facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
MIT License
20.15k stars 2.01k forks source link

Is there a bug in the multi-band diffusion training code? Can not generate 16khz normal audio after training #430

Open anonymoussky opened 3 months ago

anonymoussky commented 3 months ago

I trained a 16khz 4-band diffusion model on the whole AudioSet, but the model can not generate correct audio during the inference stage. It can only generate white noise audio. Does anyone know any possible reasons?

image

Meanwhile, 1) I also trained a 16khz 1-band diffusion model using the same code, which did not work either. 2) I tried the pretrained 24khz MBD, which can generate correct audio.

eunkoh commented 2 months ago

same issues... any updates?