Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.
We do not have a pre-trained 8kHz version, so you would need to train one yourself.
Regarding languages, although we evaluated it on 16kHz and not 8kHz (I don't think it would make a difference), when we train our model on English only it was also performing pretty good on other languages such as French and Hebrew.
You can listen to the samples here: https://facebookresearch.github.io/denoiser/
Is there any pretrained for 8Khz speech? and, are they usable for other languages then the English?