RookieJunChen / FullSubNet-plus

The official PyTorch implementation of "FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement".
Apache License 2.0
234 stars 55 forks source link

small code fix needed in clipping detection #9

Open ariel-c-verbit opened 2 years ago

ariel-c-verbit commented 2 years ago

Hey, first of all I'd like to thank you for this great model and for sharing it on github! a small bug i found:

as we know, cIRM isn't bounded and thus we are able to get mask amplitudes that are larger than 1. this can cause clipping in the enhanced signal.

to fix this you check it:

if abs(enhanced).any() > 1: print(f"Warning: enhanced is not in the range [-1, 1], {name}")

I think you meant:

if (abs(enhanced) > 1).any(): print(f"Warning: enhanced is not in the range [-1, 1], {name}")

after fixing this I see quite a lot of clipping...

https://github.com/hit-thusz-RookieCJ/FullSubNet-plus/blob/a6c89083cd083e729ca3def9a291743e8c3b516b/speech_enhance/audio_zen/inferencer/base_inferencer.py#L148

RookieJunChen commented 2 years ago

First of all, thank you very much for your correction on this code! I will fix this part later. Secondly, I don't quite understand what you said about clipping, because the code subsequently normalizes the enchanced. Will this cause a clipping problem? enhanced = np.int16(0.8 * amp * enhanced / np.max(np.abs(enhanced)))

ariel-c-verbit commented 2 years ago

What causes signal values that are out of range [-1,1] is the masking itself. (cIRM amplitudes larger than 1). This is due to the model and is a common cIRM problem.

If I understand correctly, you normalize the enhanced signal to [-0.8,0.8], thus avoiding this problem.

This fixes it, but this isn't a solution that can be used in a real life live system. (for academia it is good enough though).

In a real life system (where you receive frames and not the full signal), you would need an adaptive gain control normalization or something like that. You can't just assume that you have the whole signal in advance and normalize every couple of frames to [-0.8, 0.8] since that would amplify the noise in some cases. hope this helped.

RookieJunChen commented 2 years ago

Thank you very much for your response, it has been quite helpful to me!