Regarding reconstructing waveforms from normalized predicted spectrograms

DiegoLeon96 / Neural-Speech-Dereverberation

Machine and Deep Learning models for speech dereverberation

GNU General Public License v3.0

105 stars 21 forks source link

Hello, I went through your code for Speech dereverberation, and I find it really useful and helpful for a project I'm working on, thanks a ton for that!

I had one doubt though, I have seen that your predicted audio looks clean in spectrograms, but I can't find code to convert these predicted normalized spectrograms back into audio waveforms. I see a utils function called reconstruct_wave but that seems to be for unnormalized spectrogram inversion.

Since you send in normalized spectrograms as your input and output to train your model, I'm guessing the predicted spectrograms while evaluation would be normalized too. So in that case, how do I un-normalize these predicted spectrograms, and then invert them? Or am I missing something obvious in these inversions?

If you could help me with this, it'll be really helpful for my project. Please reach out to me at shashank2896@gmail.com or just answer here, if you are happy clearing my doubt!

Looking forward to hearing from you

Thanks and Regards

Hi, I'm sorry for the late response. I'm not sure if I understood the question, but "reconstruct_wave" function (utils.py file) is the only thing I used to spectrogram-waveform conversion. The function takes unnormalized spectrogram input as you said and the output is an audible waveform, which you can convert to .wav file. The conversion to waveform is slow (because inverse conversion in Librosa is very slow and I had no time to continue working on something more efficient). If you are working with convolutional models (u-nets, etc), then is no necessary to normalize the input (I found a best performance using unnormalized mel log power spectrograms). I used normalization for MLP and LSTM, where I saved normalization parameters of the input (mean, std, etc, it depends on normalization type), I fed the model with normalized input (normalized output as result) and then I used the parameters to recover unnormalized output (using another functions in utils.py).

DiegoLeon96 / Neural-Speech-Dereverberation

Regarding reconstructing waveforms from normalized predicted spectrograms #3