Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
MIT License
20.15k
stars
2.01k
forks
source link
Is there a bug in the multi-band diffusion training code? Can not generate 16khz normal audio after training #430
I trained a 16khz 4-band diffusion model on the whole AudioSet, but the model can not generate correct audio during the inference stage. It can only generate white noise audio. Does anyone know any possible reasons?
Meanwhile,
1) I also trained a 16khz 1-band diffusion model using the same code, which did not work either.
2) I tried the pretrained 24khz MBD, which can generate correct audio.
I trained a 16khz 4-band diffusion model on the whole AudioSet, but the model can not generate correct audio during the inference stage. It can only generate white noise audio. Does anyone know any possible reasons?
Meanwhile, 1) I also trained a 16khz 1-band diffusion model using the same code, which did not work either. 2) I tried the pretrained 24khz MBD, which can generate correct audio.