facebookresearch / denoiser

Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.
Other
1.68k stars 302 forks source link

After training there is a periodic pattern in the reconstructed melspectrogram #138

Open krantiparida opened 2 years ago

krantiparida commented 2 years ago

Thanks for making the code public. I used the same architecture and parameters as described in the paper. After training the network I see some periodic lines in the melspectrogram of the reconstruction. I am attaching an image below for reference (the reconstructed melspectrogram is the rightmost one). Can you please suggest me what might be the reason for this? This pattern is consistent across all the predictions. noise_wR8qIPb8eWw_280

adiyoss commented 1 year ago

Hi @krantiparida, That is strange, we did not observe anything like that in our models. Did you use the same config file as ours? What model version are you using?

YunyangZeng commented 1 year ago

We observed similar issue during fine-tuning Demucs. If you are training with only the waveform loss, my experience is this issue can be alleviated by the multi resolution STFT loss. Here is a paper that explains the potential cause of this tonal artifact issue:
https://arxiv.org/abs/2010.14356