domerin0 / rnn-speech

Character level speech recognizer using ctc loss with deep rnns in TensorFlow.
MIT License
77 stars 31 forks source link

Incorrect delta and delta-delta calculation? #10

Closed lingz closed 8 years ago

lingz commented 8 years ago

I think there might be an error in your deltas calculation. It looks like you are computing the deltas between neighboring mel-filterbanks, whereas my understanding is that you should be computing the deltas between adjacent frames.

Also on your edge frames, you are taking just one of the deltas without the other side. I think a more correct implementation would be to take 0 values for the edge frames.

I made a graph plotting the middle filterbank value vs. time (red), and the new delta calculation (blue), and your existing calculation (green). You can see the blue line correlates better with the red line in terms of following the spikes, showing how the signal is changing over time.

figure_1

lingz commented 8 years ago

Here is the same figure a little clearer. You can see the clear correlation between blue (new function), and red (filterbank energies)

figure_2

domerin0 commented 8 years ago

Good work. Thanks for the graphs illustrating your points.