Closed ercandogu-elevear closed 2 months ago
Actually, the compression ratio of the real/imaginary parts is also 0.3.
pred_real_c = pred_stft_real / (pred_mag**(0.7))
pred_imag_c = pred_stft_imag / (pred_mag**(0.7))
These codes are equivalent to:
pred_pha = torch.atan2(pred_imag, pred_real)
pred_real_c = pred_mag**0.3 * torch.cos(pred_pha)
This is a very simple mathematical derivation.
As for why the compression ratio of 0.3 was chosen, here are the references: [1] Yin, Dacheng, et al. "TridentSE: Guiding speech enhancement with 32 global tokens." arXiv preprint arXiv:2210.12995 (2022). [2] Li, Andong, et al. "On the importance of power compression and phase estimation in monaural speech dereverberation." JASA express letters 1.1 (2021).
Hello, I was wondering if there is any reference for the value selections for the magnitude compression inside the MSE losses, where the magnitude is scaled by 0.3, while for the real and imaginary part by 0.7. Why are these not all 0.3 (common compression value) and how did you decide then on these values?
Thanks.