Loss function - Magnitude compression values inside MSE Loss

Xiaobin-Rong / gtcrn

The official implementation of GTCRN, an ultra-lite speech enhancement model.

MIT License

176 stars 28 forks source link

Actually, the compression ratio of the real/imaginary parts is also 0.3.

pred_real_c = pred_stft_real / (pred_mag**(0.7))
pred_imag_c = pred_stft_imag / (pred_mag**(0.7))

These codes are equivalent to:

pred_pha = torch.atan2(pred_imag, pred_real)
pred_real_c = pred_mag**0.3 * torch.cos(pred_pha)

This is a very simple mathematical derivation.

As for why the compression ratio of 0.3 was chosen, here are the references: [1] Yin, Dacheng, et al. "TridentSE: Guiding speech enhancement with 32 global tokens." arXiv preprint arXiv:2210.12995 (2022). [2] Li, Andong, et al. "On the importance of power compression and phase estimation in monaural speech dereverberation." JASA express letters 1.1 (2021).

Xiaobin-Rong / gtcrn

Loss function - Magnitude compression values inside MSE Loss #35