I think there might be an error in your deltas calculation. It looks like you are computing the deltas between neighboring mel-filterbanks, whereas my understanding is that you should be computing the deltas between adjacent frames.
Also on your edge frames, you are taking just one of the deltas without the other side. I think a more correct implementation would be to take 0 values for the edge frames.
I made a graph plotting the middle filterbank value vs. time (red), and the new delta calculation (blue), and your existing calculation (green). You can see the blue line correlates better with the red line in terms of following the spikes, showing how the signal is changing over time.
I think there might be an error in your deltas calculation. It looks like you are computing the deltas between neighboring mel-filterbanks, whereas my understanding is that you should be computing the deltas between adjacent frames.
Also on your edge frames, you are taking just one of the deltas without the other side. I think a more correct implementation would be to take 0 values for the edge frames.
I made a graph plotting the middle filterbank value vs. time (red), and the new delta calculation (blue), and your existing calculation (green). You can see the blue line correlates better with the red line in terms of following the spikes, showing how the signal is changing over time.