Normalizing spectrograms

f90 / AdversarialAudioSeparation

Code accompanying the paper "Semi-supervised adversarial audio source separation applied to singing voice extraction"

https://arxiv.org/abs/1711.00048

MIT License

83 stars 15 forks source link

Normalizing spectrograms #5

Closed stolpa4 closed 5 years ago

stolpa4 commented 5 years ago

As far as I know, the result of log1p(x) can be negative. You use this function to 'normalize' the spectrograms of target accompaniment and vocals and then use the difference between network outputs and these spectrograms in your loss function. However, network outputs after ReLU can't be negative. I see the paper and I realize that it must work, so what do I miss? Please help me

f90 commented 5 years ago

In general yes, log1p(x) can be negative, and then we would have problems. But in our case, x is the magnitude (energy) of the spectrogram at a given time and frequency, which is always at least 0. Therefore log(x+1) >= log(1) = 0, and the result is also always at least 0. This allows us to use a ReLU function for the generator output.

stolpa4 commented 5 years ago

Thank you so much for the answer!