domerin0 / rnn-speech

Character level speech recognizer using ctc loss with deep rnns in TensorFlow.
MIT License
77 stars 31 forks source link

Optimization #15

Closed AMairesse closed 8 years ago

AMairesse commented 8 years ago

Hi,

Here is an optimization to get the actual size of the feat_vec for dynamic_rnn. It's faster and better for the learning process.

I merged your recent commit, only removed seq2seq which is not used.

Thanks, Antoine.

domerin0 commented 8 years ago

Good work, that's something I was unsure how to handle originally. I had assumed the network would learn to ignore the padding, but this is clearly a better solution.

AMairesse commented 8 years ago

Thanks, I think there is still an issue here : when truncating the wav file we should truncate also the corresponding text, if we do not we would expect the rnn to find "what the end of the sentence would like" which is not what we want in this case. Problem is we don't know where to cut the text file, we could do it proportional to what we did on the wav file but I think it would be a mistake. Only solution I see would be to reject files that are to long. Keeping the last part to put it in another batch wouldn't be better because we would still have to know where to cut so that the resulting text and wav files are fully synchronous.

What do you think ?

domerin0 commented 8 years ago

I agree. I don't think throwing out data should never be a problem since we can always either a) increase sequence length, or b) just get more data as there seems to be a lot more available.