facebookresearch / denoiser

Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.
Other
1.62k stars 299 forks source link

Multi-channel Audio Enhancement #140

Open TankyFranky opened 1 year ago

TankyFranky commented 1 year ago

Hello,

I am wondering what the best approach would be to adapt the denoiser for multi-channel audio. I have a four microphone array that I would like to apply denoiser to as a pre-processing step.

Can the model.chin and model.chout paramters be changed when performing inference on a network that has been trained on only one channel? Will the inference/forward step adapt if the input tensor is multiple channels of audio (all of the same frame size). I have modified the live.py example to perform sequential forward passes (one for each channel), but obviously this tanks the real time performance.

Any advice on applying denoiser to multi-channel audio would be appreciated.

Thanks.

adiyoss commented 1 year ago

Hi @TankyFranky, You can definitely reconfigured the model to get more than one channel as input and output. However, if you are going that way you should train a new model prom scratch. If you want to use the pre-trained models, so what you did (process each channel independently) would be the best/easiest way. Regarding the real-time constraints, maybe you can process the channels in parallel?