Open hdmjdp opened 6 years ago
Try increasing sigma during inference.
@rafaelvalle I have tried, but it can not remove it.
I also can't find a satisfying sigma. Bigger sigma makes the periodic noise less obvious, but harms the speech quality. @rafaelvalle, Do you think smaller n_group value can solve the periodic noise problem? How does the n_group size affect the speech quality in your experiments?
The periodic noise can be solved by decreasing the bias of the model, which can be achieved by sampling with higher sigma. To achieve similar speech quality with higher sigma, one needs a model that better fits the data.
@rafaelvalle I cannot understand your ans "The periodic noise can be solved by decreasing the bias of the model, which can be achieved by sampling with higher sigma", can you explain in detail? train with high sigma?
@rafaelvalle Thanks for your reply. Do you think WaveGlow with smaller n_group value could better fits the data? Do you experiment with different n_group values?
There's a hack here to remove the "line noise" https://github.com/NVIDIA/tacotron2/issues/142#issuecomment-466506044
@rafaelvalle Thanks for suggesting the frequency domain line noise removal. I find another way to greatly attenuate the line noise in spectrogram on my setup for narrow band speech generation. Keep the same segment length, increase the number of group from 8 (default) to 16 or 32. I find it helped to converge to good quality speech faster as well. Though I don't have good explanation on that, any insight?
@triwoods Can you share samples with us using group size 8, 16 and 32?
@rafaelvalle I've implemented the denoiser, but the output generated is silent; is there a way of fixing this?
Just FYI. Another quick fix I used to remove these line noise was to apply a notch filter around these frequency lines with an appropriate Q value on the generated waveform.
@guanlongzhao would you mind sharing your code ?
@guanlongzhao would you mind sharing your code?
See this code snippet. For 16KHz audios, you can sequentially filter the waveform by setting w0
to [2000, 4000, 6000]
. I used Q = 50
in my experiments. Although in general, I think you should stick to the official denoiser since the line noises and other background noises were caused by the nature of the method, IMHO.
@guanlongzhao thanks my friend.
The noise peaks are detected at 0th, 256th, 512th, 768th bins (i.e. quartiles given 1024 frequency bins). The question is why these particular quartile positions? My initial thought was the hop-size being ¼ of win-size, or the dilation being power of 2 in WN interacting with the bias values in an unexpected way, but these are just my baseless suspicions for now. The figure below is the visualization of the denoiser's bias spec.
This is because Waveglow generates 8 samples at a time.
Assume there is time invariant mel spec. input and sigma=0.0. In this case the model is deterministic, resulting audio is periodic, and its period is 8 samples.
To remove this effect, sigma must be sufficiently large and the model must handle cross sample correlations appropriately. Perhaps because the model can handle long-distance crosscorrelations more easily with large n_group, as @triwoods pointed out, this periodic noise can tend to be removed more quickly, though I don't have formal proof of it.
@ishihara1989 That makes a lot of sense! Gave me an aha moment. Thank you very much for taking your time to ponder and answer my question.
@rafaelvalle Firstly, thanks for your gread work about waveglow project. My questions is that: how to automatic remove the line noise in inference process?Thanks
PS: In inference stage, I use --denoiser_strength option which is aiming to removes model bias, and I start with 0.1 and adjust the value, but I found the line noise still exists.
waveglow_eval_group_32.zip Here is my config.json and we do not hear any line noises any more. In my experiment the training lasted only 4 days on single GPU to achieve such results. Hi @jaeseongyou , why did you feel confused with my evaluation? I know the results are not good enough since it only took less than 300 epochs. Please speak out your confusion and let me know it.
waveglow_eval_group_32.zip Here is my config.json and we do not hear any line noises any more. In my experiment the training lasted only 4 days on single GPU to achieve such results. Hi @jaeseongyou , why did you feel confused with my evaluation? I know the results are not good enough since it only took less than 300 epochs. Please speak out your confusion and let me know it.
Hi, I took a listen of your samples "waveglow_eval_group_32.zip" and there are still obvious line noise.
waveglow_eval_group_32.zip Maybe my cheap and crap headset didn't amplify such noises quite well. But I keep training and now it is up to 1k epochs. If there are still line noises please let me know.
Hi @begeekmyfriend. As @MuyangDu pointed out, I thought the audio files you posted still suffer from the same noise problem. Above is the spectrogram of your 1k result, which seems to have a lot more lines than the outcome based on the original config. It might have something to do with changing the number of group from 8 to 32.
@jaeseongyou But the final results do nothing with whether the spectrograms look good or not. In my headset I do not hear any line noises any more, right?
@jaeseongyou But the final results do nothing with whether the spectrograms look good or not. In my headset I do not hear any line noises any more, right?
Nope, I can hear strong line noise in your samples. Maybe you can try to use another device or headset to test your samples.
the line noise seems more obvious after trained more epochs. But in the sample provided by nvidia, there are very clean without this problem. Still don't know how to prevent the line noise.
this problem has disturbed me long time. any people can remove it.