domerin0 / rnn-speech

Character level speech recognizer using ctc loss with deep rnns in TensorFlow.
MIT License
77 stars 31 forks source link

Switch to librosa ? #22

Closed AMairesse closed 7 years ago

AMairesse commented 7 years ago

Hi, I've been working on a port from python_speech_features to librosa on a dedicated branch. There is a couple of advantages :

From librosa I'm getting 20 inputs and not 123 like before. The difference is not really clear to me, it seems that librosa does not give energy, delta and delta-delta values and that the MFCC is done with less intervals but I'm just guessing here because the MFCC part is the part I less understand :-) Anyway the learning results seems ok and I was able to implement an asynchronous loading using tensorflow queuing system which is just great.

What do you think about it ? Eventually would you be able to obtain the 123-dim input vector with librosa ? It would be interesting to compare training with 20-dim against 123-dim input vector...

Thanks, Antoine.

domerin0 commented 7 years ago

Hello Antoine, I think that's a good idea. You're doing a great job, and I haven't been very present here for the past few months, so if you think it's best, let's do it. I'm not particularly concerned if it isn't totally true to the paper.

I agree it would be interesting to compare the input of 20 vs 123 (maybe the extra information doesn't contribute to a much 'better' model). I am finishing my term exams off this week, so I should be able to look more into that shortly. I have also wanted to work on the language model for a while so I will try to get back to that as well.

I'm glad you brought up the tensorflow async queuing system, it's been on mind that it would probably be much better than what I had originally done. It should allow for easier plug and play to other datasets as well.

AMairesse commented 7 years ago

Ok great, I'll merge it then. Good luck with your exams, I'm enjoying Christmas holidays and doing some coding while keeping an eye on the kids :-)