Output normalization in dualrnn_test_wav.py

JusperLee / Dual-Path-RNN-Pytorch

Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation implemented by Pytorch

Apache License 2.0

417 stars 65 forks source link

Closed R7788380 closed 2 years ago

R7788380 commented 2 years ago

Thank you for your contribution !

I have a question in the line 44 of dualrnn_test_wav.py.

You normalize the DPRNN prediction like below:

norm = torch.norm(egs,float('inf'))
#norm
s = s - torch.mean(s)
s = s*norm/torch.max(torch.abs(s))

I'm a little confused about here, because you don't do any input normalization in training pipeline, why you do here?

JusperLee commented 2 years ago

I do this because if I use torchaudio to generate audio truncation may occur. You also can use scipy to generate estimated audio.

R7788380 commented 2 years ago

I don't understand why truncation occurs when using torchaudio to generate(torchaudio.save) audio. Can you give an example?

So if I use scipy or some other function (ex. soundfile) I can avoid truncation?

JusperLee commented 2 years ago

I think you can use the following code without any questions.

from scipy.io import wavfile
wavfile.write('a.wav', 8000, np.asarray(data, dtype=np.int16)

R7788380 commented 2 years ago

Thank you so much!