Xiaobin-Rong / gtcrn

The official implementation of GTCRN, an ultra-lite speech enhancement model.
MIT License
176 stars 28 forks source link

Loss function - Magnitude compression values inside MSE Loss #35

Closed ercandogu-elevear closed 2 weeks ago

ercandogu-elevear commented 3 weeks ago

Hello, I was wondering if there is any reference for the value selections for the magnitude compression inside the MSE losses, where the magnitude is scaled by 0.3, while for the real and imaginary part by 0.7. Why are these not all 0.3 (common compression value) and how did you decide then on these values?

Thanks.

Xiaobin-Rong commented 3 weeks ago

Actually, the compression ratio of the real/imaginary parts is also 0.3.

pred_real_c = pred_stft_real / (pred_mag**(0.7))
pred_imag_c = pred_stft_imag / (pred_mag**(0.7))

These codes are equivalent to:

pred_pha = torch.atan2(pred_imag, pred_real)
pred_real_c = pred_mag**0.3 * torch.cos(pred_pha)

This is a very simple mathematical derivation.

As for why the compression ratio of 0.3 was chosen, here are the references: [1] Yin, Dacheng, et al. "TridentSE: Guiding speech enhancement with 32 global tokens." arXiv preprint arXiv:2210.12995 (2022). [2] Li, Andong, et al. "On the importance of power compression and phase estimation in monaural speech dereverberation." JASA express letters 1.1 (2021).