Open lixinghe1999 opened 2 years ago
Hi, xinghe,
Thanks for your attention. Do not confuse them. Actually, there are two types of normalization within the code, namely loudness normalization and distribution normalization.
Loudness normalization: In reality, there is no prior information about the loudness of the mixture signal (or call it noisy speech). Therefore, we want the simulated signals to meet more possible loudness. So, basically, we use the Decibels relative to full scale (dBFS or dB FS) to normalize the loudness so that the loudness of the simulated signals is similar to the loudness of nature as much as possible. The whole process of loudness normalization is done in the simulation.
Distribution normalization: Basically, we would like to adjust the distribution of the input signals before passing them to the neural networks. You may minus the mean and divide it by the variance, like layer normalization. But here, we just divided it by the mean value (may have a better method to conduct distribution normalization on the input signals in the future). The main target is to impact the distribution of the input signals such that the neural network can be trained better. That's why we need to normalize the input signals during both the training and inference.
Dear Xiang,
Thank you for explanations!
Dear authors,
I notice that in snr_mix, the signal dBFs will be [-35, -15], meaning the intensity can change randomly. However, in inference.py, normalization is applied, which is weird. From my understanding, we either normalize all data or don't normalize all data, but why do you choose to normalize it in inference while discarding it during training? Maybe I have some misunderstanding, please correct me if possible.
Best