Switch to librosa ? - Githubissues

AMairesse commented 7 years ago

Hi, I've been working on a port from python_speech_features to librosa on a dedicated branch. There is a couple of advantages :

read flac files without the need to convert to wav
seems faster I'm getting some encouraging results using a p2.xlarge instance on AWS with LibriSpeech. I'm thinking of pulling it back in the main branch but I wanted to have your opinion first. It would change the project quite a bit and not be a "correct implementation" of the paper by Kyuyeon Hwang and Wonyong Sung.

From librosa I'm getting 20 inputs and not 123 like before. The difference is not really clear to me, it seems that librosa does not give energy, delta and delta-delta values and that the MFCC is done with less intervals but I'm just guessing here because the MFCC part is the part I less understand :-) Anyway the learning results seems ok and I was able to implement an asynchronous loading using tensorflow queuing system which is just great.

What do you think about it ? Eventually would you be able to obtain the 123-dim input vector with librosa ? It would be interesting to compare training with 20-dim against 123-dim input vector...

Thanks, Antoine.

domerin0 commented 7 years ago

Hello Antoine, I think that's a good idea. You're doing a great job, and I haven't been very present here for the past few months, so if you think it's best, let's do it. I'm not particularly concerned if it isn't totally true to the paper.

I agree it would be interesting to compare the input of 20 vs 123 (maybe the extra information doesn't contribute to a much 'better' model). I am finishing my term exams off this week, so I should be able to look more into that shortly. I have also wanted to work on the language model for a while so I will try to get back to that as well.

I'm glad you brought up the tensorflow async queuing system, it's been on mind that it would probably be much better than what I had originally done. It should allow for easier plug and play to other datasets as well.

AMairesse commented 7 years ago

Ok great, I'll merge it then. Good luck with your exams, I'm enjoying Christmas holidays and doing some coding while keeping an eye on the kids :-)

domerin0 / rnn-speech

Switch to librosa ? #22