facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
MIT License
20.5k stars 2.06k forks source link

MultiBand Diffusion Noises. #236

Open startreker-shzy opened 1 year ago

startreker-shzy commented 1 year ago

Thanks for your work. I have tried the mbd followed the examples in https://github.com/facebookresearch/audiocraft/blob/main/docs/MBD.md. The sound quality is good but I got a white noise along the audio. Here is a sample. speech_test.zip Is there any config or parameters should be change? Or any bugs with mbd? How can I fix it?

cvillela commented 1 year ago

Experiencing the same thing!

robinsrm commented 1 year ago

Hey :) Thank you for the feedback. We noticed that the scaling of the audio can result in white noise in the output of the model. This is something I am currently working on. If you want a quick fix you can simply rescale your input signal, it significantly improves the result speech_test 2.zip e.g. code :

import soundfile as sf
import torch
import IPython.display as ipd 
from audiocraft.models import MultiBandDiffusion
mbd = MultiBandDiffusion.get_mbd_24khz()
path = 'speech_test/test_speech.wav'
wav, sr = sf.read(path)
wav_torch = torch.from_numpy(wav).float().mean(dim=1).view(1, 1, -1).cuda()
wav_torch = wav_torch / wav_torch.std() * 0.25 # arbitrary coeff
out_diffusion = mbd.regenerate(wav_torch, sample_rate=sr)
ipd.display(ipd.Audio(out_diffusion[0].cpu(), rate=sr))
startreker-shzy commented 1 year ago

@Sparker17 Thanks for you reply. I will try this as soon as possible.

adagio715 commented 1 year ago

@Sparker17 Thanks for your reply. I also tried your suggestion and it works. I have another question: In my experiment (see attachment) and the speech_test.zip provided by @startreker-shzy , the mbd outputs have energy until ~6khz, while the signal is very weak above 6khz, but in your results, the spectrum of the mbd output is very smooth all the way till 12khz. Do you have any idea about this issue? What configuration and parameters can be tuned to fix this problem? music_test.zip

cvillela commented 12 months ago

@Sparker17 On this topic, should samples used for training/finetuning also be normalized in this fashion?