deepsound-project / samplernn-pytorch

PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
MIT License
288 stars 75 forks source link

Why is prev_samples = 2 * dequantized ? #7

Closed qbx2 closed 7 years ago

qbx2 commented 7 years ago

https://github.com/deepsound-project/samplernn-pytorch/blob/master/model.py#L203

What is the purpose of multiplying by 2?

koz4k commented 7 years ago

It's so that dequantized samples are from the [-2, 2] interval. According to the autors, it's "a reasonable range to pass as inputs to the RNN". See the original implementation in Theano: https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/master/models/two_tier/two_tier.py#L250

qbx2 commented 7 years ago

Thank you for your kind answer. As I understand, it's just for rescaling for RNN, right?

koz4k commented 7 years ago

Yes, it's so that the RNN learns better in the early stages of training. I'm not sure what's the reasoning behind this specific scaling factor, but I can say that the standard deviation of the sine wave scaled to the [-2, 2] interval is sqrt(2) ~= 1.4, so the std of a typical input should probably be somewhere between 0 and that value (we typically want the std of the input to be around 1).