dhgrs / pytorch-UniWaveNet

31 stars 13 forks source link

How to eliminate the noise in the speech generated by the model? #3

Closed switchzts closed 6 years ago

switchzts commented 6 years ago

Some electrical sounds exist in the generated audio, which greatly affects the sense of hearing. In the original text, there is some phase loss added to the loss. Do you have any thoughts of it?

dhgrs commented 6 years ago

There are two ways. First, train more iterations without phase loss. I continue the training after upload the generated sample and the quality got a bit better. I'll upload it after finish training.

Second, add phase loss. Please check this paper. They use DNN for phase reconstruction from magnitude spectrogram. So the loss function gives a good suggestion.

switchzts commented 6 years ago

@dhgrs ok~I just find that there is no different between 350k-gen-wav ande 210k-gen-wav,both of them had some electrical sounds exist in the audio.Should I change learning rate or something? BTW, my batch-size is 2

dhgrs commented 6 years ago

I tried changing scale from magnitude to log after 500k iterations. And I uploaded new generated audio samples yesterday so please listen.

But still a bit noisy. SING, FAIR's NIPS 2018 paper also trains wave generator by spectrogram loss. So it would help us. https://research.fb.com/publications/sing-symbol-to-instrument-neural-generator/

dhgrs commented 6 years ago

@switchzts And check this paper uploaded to arXiv yesterday. Same concept, using spectral loss and phase loss. The difference is network architecture. They use LSTM. https://arxiv.org/abs/1810.11945

switchzts commented 6 years ago

@dhgrs This paper looks like a new method of vocoder,and its demo sounds like similar to the WN!It has provide code,I will try it first,Thanks