RookieJunChen / FullSubNet-plus

The official PyTorch implementation of "FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement".
Apache License 2.0
235 stars 55 forks source link

Model causality #10

Closed Azatiussss closed 2 years ago

Azatiussss commented 2 years ago

Is the model causal? It seems like during training and during inference the ChannelTimeSenseSELayer is used, where average pooling is taken along the frames axis, or I am supposed to process audio chunk-by-chunk to obtain the honest result with usage of only limited look ahead amount of data?

https://github.com/hit-thusz-RookieCJ/FullSubNet-plus/blob/81e84b43d4f716cda1cd065d608f6c7b6758e791/speech_enhance/audio_zen/model/module/attention_model.py#L57-L71

RookieJunChen commented 2 years ago

You are right, the model is non-casual. By real-time in the paper, we mean that rtf is less than 1. If you want to implement a casual model, try doing a dynamic avg-pooling (averaging only the previous frame and the current frame)?

Azatiussss commented 2 years ago

Ok, thank you very much for fast and detailed response!