NVIDIA / waveglow

A Flow-based Generative Network for Speech Synthesis
BSD 3-Clause "New" or "Revised" License
2.28k stars 529 forks source link

the line noise, anyone can remove it? #55

Open hdmjdp opened 5 years ago

hdmjdp commented 5 years ago

this problem has disturbed me long time. any people can remove it.

rafaelvalle commented 5 years ago

Try increasing sigma during inference.

hdmjdp commented 5 years ago

@rafaelvalle I have tried, but it can not remove it.

wenyong-h commented 5 years ago

I also can't find a satisfying sigma. Bigger sigma makes the periodic noise less obvious, but harms the speech quality. @rafaelvalle, Do you think smaller n_group value can solve the periodic noise problem? How does the n_group size affect the speech quality in your experiments?

rafaelvalle commented 5 years ago

The periodic noise can be solved by decreasing the bias of the model, which can be achieved by sampling with higher sigma. To achieve similar speech quality with higher sigma, one needs a model that better fits the data.

hdmjdp commented 5 years ago

@rafaelvalle I cannot understand your ans "The periodic noise can be solved by decreasing the bias of the model, which can be achieved by sampling with higher sigma", can you explain in detail? train with high sigma?

wenyong-h commented 5 years ago

@rafaelvalle Thanks for your reply. Do you think WaveGlow with smaller n_group value could better fits the data? Do you experiment with different n_group values?

rafaelvalle commented 5 years ago

There's a hack here to remove the "line noise" https://github.com/NVIDIA/tacotron2/issues/142#issuecomment-466506044

triwoods commented 5 years ago

@rafaelvalle Thanks for suggesting the frequency domain line noise removal. I find another way to greatly attenuate the line noise in spectrogram on my setup for narrow band speech generation. Keep the same segment length, increase the number of group from 8 (default) to 16 or 32. I find it helped to converge to good quality speech faster as well. Though I don't have good explanation on that, any insight?

rafaelvalle commented 5 years ago

@triwoods Can you share samples with us using group size 8, 16 and 32?

Cacozelia commented 5 years ago

@rafaelvalle I've implemented the denoiser, but the output generated is silent; is there a way of fixing this?

guanlongzhao commented 5 years ago

Just FYI. Another quick fix I used to remove these line noise was to apply a notch filter around these frequency lines with an appropriate Q value on the generated waveform.

OswaldoBornemann commented 5 years ago

@guanlongzhao would you mind sharing your code ?

guanlongzhao commented 5 years ago

@guanlongzhao would you mind sharing your code?

See this code snippet. For 16KHz audios, you can sequentially filter the waveform by setting w0 to [2000, 4000, 6000]. I used Q = 50 in my experiments. Although in general, I think you should stick to the official denoiser since the line noises and other background noises were caused by the nature of the method, IMHO.

OswaldoBornemann commented 5 years ago

@guanlongzhao thanks my friend.

jaeseongyou commented 5 years ago

The noise peaks are detected at 0th, 256th, 512th, 768th bins (i.e. quartiles given 1024 frequency bins). The question is why these particular quartile positions? My initial thought was the hop-size being ¼ of win-size, or the dilation being power of 2 in WN interacting with the bias values in an unexpected way, but these are just my baseless suspicions for now. The figure below is the visualization of the denoiser's bias spec. bias

ishihara1989 commented 5 years ago

This is because Waveglow generates 8 samples at a time.

Assume there is time invariant mel spec. input and sigma=0.0. In this case the model is deterministic, resulting audio is periodic, and its period is 8 samples.

To remove this effect, sigma must be sufficiently large and the model must handle cross sample correlations appropriately. Perhaps because the model can handle long-distance crosscorrelations more easily with large n_group, as @triwoods pointed out, this periodic noise can tend to be removed more quickly, though I don't have formal proof of it.

jaeseongyou commented 5 years ago

@ishihara1989 That makes a lot of sense! Gave me an aha moment. Thank you very much for taking your time to ponder and answer my question.

melspectrum007 commented 4 years ago

@rafaelvalle Firstly, thanks for your gread work about waveglow project. My questions is that: how to automatic remove the line noise in inference process?Thanks

PS: In inference stage, I use --denoiser_strength option which is aiming to removes model bias, and I start with 0.1 and adjust the value, but I found the line noise still exists.

begeekmyfriend commented 4 years ago

waveglow_eval_group_32.zip Here is my config.json and we do not hear any line noises any more. In my experiment the training lasted only 4 days on single GPU to achieve such results. image Hi @jaeseongyou , why did you feel confused with my evaluation? I know the results are not good enough since it only took less than 300 epochs. Please speak out your confusion and let me know it.

MuyangDu commented 4 years ago

waveglow_eval_group_32.zip Here is my config.json and we do not hear any line noises any more. In my experiment the training lasted only 4 days on single GPU to achieve such results. image Hi @jaeseongyou , why did you feel confused with my evaluation? I know the results are not good enough since it only took less than 300 epochs. Please speak out your confusion and let me know it.

Hi, I took a listen of your samples "waveglow_eval_group_32.zip" and there are still obvious line noise.

begeekmyfriend commented 4 years ago

waveglow_eval_group_32.zip Maybe my cheap and crap headset didn't amplify such noises quite well. But I keep training and now it is up to 1k epochs. If there are still line noises please let me know. image image

jaeseongyou commented 4 years ago

image Hi @begeekmyfriend. As @MuyangDu pointed out, I thought the audio files you posted still suffer from the same noise problem. Above is the spectrogram of your 1k result, which seems to have a lot more lines than the outcome based on the original config. It might have something to do with changing the number of group from 8 to 32.

begeekmyfriend commented 4 years ago

@jaeseongyou But the final results do nothing with whether the spectrograms look good or not. In my headset I do not hear any line noises any more, right?

MuyangDu commented 4 years ago

@jaeseongyou But the final results do nothing with whether the spectrograms look good or not. In my headset I do not hear any line noises any more, right?

Nope, I can hear strong line noise in your samples. Maybe you can try to use another device or headset to test your samples.

terryyizhong commented 4 years ago

the line noise seems more obvious after trained more epochs. But in the sample provided by nvidia, there are very clean without this problem. Still don't know how to prevent the line noise.