Closed AMairesse closed 8 years ago
Good work, that's something I was unsure how to handle originally. I had assumed the network would learn to ignore the padding, but this is clearly a better solution.
Thanks, I think there is still an issue here : when truncating the wav file we should truncate also the corresponding text, if we do not we would expect the rnn to find "what the end of the sentence would like" which is not what we want in this case. Problem is we don't know where to cut the text file, we could do it proportional to what we did on the wav file but I think it would be a mistake. Only solution I see would be to reject files that are to long. Keeping the last part to put it in another batch wouldn't be better because we would still have to know where to cut so that the resulting text and wav files are fully synchronous.
What do you think ?
I agree. I don't think throwing out data should never be a problem since we can always either a) increase sequence length, or b) just get more data as there seems to be a lot more available.
Hi,
Here is an optimization to get the actual size of the feat_vec for dynamic_rnn. It's faster and better for the learning process.
I merged your recent commit, only removed seq2seq which is not used.
Thanks, Antoine.