XinleiRen / MTFAA-Net

An unofficial non-causal Tensorflow implementation of "Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement"
12 stars 5 forks source link

Real-time denoising not working well #2

Closed narrietal closed 1 year ago

narrietal commented 1 year ago

Hi,

Thanks for sharing the code.

I trained the model with audio lengths of 2s (I changed the n_frames parameter in the asat function accordingly). Also, the STFT is computed with a window of 32ms and 8ms of overlap.

I would like to perform real-time denoising on single frames of 32ms of length. However, at inference time the network only does a proper denoising with 2s segments, but it does a poor job with 32ms segments.

Do you know why I am experiencing this behaviour and how I could fix achieve my goal?

XinleiRen commented 1 year ago

Hi,

the model structure implemented by me is designed for non-real-time denoising. If you want to perform real-time denoising, you need to: 1. change CNN and Attention layers to meet real-time requirement, and 2. you can't feed a 32ms segment to the model, you should also feed a 2s segment, where the last 32ms segment is the audio you want to denoise, and the first 1968ms segment is used as the context.

narrietal commented 1 year ago

Hi,

Thank you for such a quick response.

I appreciate the information, could you guide me a bit more on what kind of changes I should make to the network (CNN and Att. layers)? Perhaps, could you point me to some good resource or similar project where to find more information?

XinleiRen commented 1 year ago

Hi,

The CNN and Attention layers I used in this model are non-causal layers,you need to change these layer to causal layers. 《A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement》introduces the causal CNN layer.

You can also google 'causal CNN' and 'causal-attention' for more information.

As the similar project, you can visit MTFAA-Net.