Open simopal6 opened 6 years ago
It's been a long time but if you have found answers for this I'll gladly take them, i'm also confused
I remember getting an intuition of how that worked, but I can't remember exactly what that was. I think that the actual frame size was the product between the two vectors, or something like that... Sorry I can't be of much help :(
Hello, It's been an even longer time but I think I'm starting to understand it (I had to read the helper description at least 50 times...). The helper description says :
parser.add_argument( '--frame_sizes', nargs='+', type=int, required=True, help='frame sizes in terms of the number of lower tier frames, \ starting from the lowest RNN tier' )
So I think you have to give the number of same you want in a given fram as a function of all the other frame from the lower tier.
Ex: From the paper "HIGH-QUALITY SPEECH CODING WITH SAMPLE RNN" I need the following frame sizes: FS (1) = FS (2) = 2, FS (3) = 16 and FS (4) = 160.
Intuitively I would put as argument: --frame_sizes 2 2 16 160
or --frame_sizes 160 16 2 2
But for what I understand I would need to put [as argument] : --frame_sizes 2 1 8 10
;
-2 because it's the lowest tier -1 because the lower tier frame yeild 2 (2x1=2) -8 because the lower tier frame now yeild 2 (2x8=16) -10 because the lower tier frame now yeild 16 (10x16=160)
That would explain the use of ns_frame_samples = map(int, np.cumprod(frame_sizes))
However, I might just not understand as well. I don't know why they would do it that way because it is really confusing if yes (at least for me)
Hope It helped.
If you have more info please do correct me.
Tks
-bert
Hello, can you please explain the purpose of
frame_sizes
andns_frame_samples
in theSampleRNN
constructor?I get the meaning of
frame_sizes
from the paper. However, there's something strange (at least to me): in the paper, especially in the main figure, it seems the the frame size at tier 3 is 16 and the frame size at tier 2. In the code, you use the same values (frame_sizes = [16, 4]
), however it seems that the order is reversed, because inPredictor
'sforward()
you scan the RNNs inreversed
order, so apparently you use 4 for tier 3 and 16 for tier 2. Is there something I'm not getting right here...?Besides, what's the purpose of
n_frame_samples
for each RNN?Thanks!