Open startreker-shzy opened 1 year ago
Experiencing the same thing!
Hey :) Thank you for the feedback. We noticed that the scaling of the audio can result in white noise in the output of the model. This is something I am currently working on. If you want a quick fix you can simply rescale your input signal, it significantly improves the result speech_test 2.zip e.g. code :
import soundfile as sf
import torch
import IPython.display as ipd
from audiocraft.models import MultiBandDiffusion
mbd = MultiBandDiffusion.get_mbd_24khz()
path = 'speech_test/test_speech.wav'
wav, sr = sf.read(path)
wav_torch = torch.from_numpy(wav).float().mean(dim=1).view(1, 1, -1).cuda()
wav_torch = wav_torch / wav_torch.std() * 0.25 # arbitrary coeff
out_diffusion = mbd.regenerate(wav_torch, sample_rate=sr)
ipd.display(ipd.Audio(out_diffusion[0].cpu(), rate=sr))
@Sparker17 Thanks for you reply. I will try this as soon as possible.
@Sparker17 Thanks for your reply. I also tried your suggestion and it works. I have another question: In my experiment (see attachment) and the speech_test.zip provided by @startreker-shzy , the mbd outputs have energy until ~6khz, while the signal is very weak above 6khz, but in your results, the spectrum of the mbd output is very smooth all the way till 12khz. Do you have any idea about this issue? What configuration and parameters can be tuned to fix this problem? music_test.zip
@Sparker17 On this topic, should samples used for training/finetuning also be normalized in this fashion?
Thanks for your work. I have tried the mbd followed the examples in https://github.com/facebookresearch/audiocraft/blob/main/docs/MBD.md. The sound quality is good but I got a white noise along the audio. Here is a sample. speech_test.zip Is there any config or parameters should be change? Or any bugs with mbd? How can I fix it?