Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.
Hi,
I have a question on how the causal model is evaluated on Valentini. Is it true streaming evaluation or just the evaluation (not true streaming) on the model trained without Bidirectional LSTM ? As the causal pretrained model on Valentini was not provided, I could not try it by myself and see it. From the code, I understood that the convolutions were not changed to causal(By appropriately prepending the zeros to the input) when training the causal model. I am just wondering if this model is a true causal model if the convolutions are still not causal.
Hi, I have a question on how the causal model is evaluated on Valentini. Is it true streaming evaluation or just the evaluation (not true streaming) on the model trained without Bidirectional LSTM ? As the causal pretrained model on Valentini was not provided, I could not try it by myself and see it. From the code, I understood that the convolutions were not changed to causal(By appropriately prepending the zeros to the input) when training the causal model. I am just wondering if this model is a true causal model if the convolutions are still not causal.