Closed qbx2 closed 6 years ago
It's so that dequantized samples are from the [-2, 2] interval. According to the autors, it's "a reasonable range to pass as inputs to the RNN". See the original implementation in Theano: https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/master/models/two_tier/two_tier.py#L250
Thank you for your kind answer. As I understand, it's just for rescaling for RNN, right?
Yes, it's so that the RNN learns better in the early stages of training. I'm not sure what's the reasoning behind this specific scaling factor, but I can say that the standard deviation of the sine wave scaled to the [-2, 2] interval is sqrt(2) ~= 1.4
, so the std of a typical input should probably be somewhere between 0 and that value (we typically want the std of the input to be around 1).
https://github.com/deepsound-project/samplernn-pytorch/blob/master/model.py#L203
What is the purpose of multiplying by 2?