Closed AMairesse closed 7 years ago
Hello Antoine, I think that's a good idea. You're doing a great job, and I haven't been very present here for the past few months, so if you think it's best, let's do it. I'm not particularly concerned if it isn't totally true to the paper.
I agree it would be interesting to compare the input of 20 vs 123 (maybe the extra information doesn't contribute to a much 'better' model). I am finishing my term exams off this week, so I should be able to look more into that shortly. I have also wanted to work on the language model for a while so I will try to get back to that as well.
I'm glad you brought up the tensorflow async queuing system, it's been on mind that it would probably be much better than what I had originally done. It should allow for easier plug and play to other datasets as well.
Ok great, I'll merge it then. Good luck with your exams, I'm enjoying Christmas holidays and doing some coding while keeping an eye on the kids :-)
Hi, I've been working on a port from python_speech_features to librosa on a dedicated branch. There is a couple of advantages :
From librosa I'm getting 20 inputs and not 123 like before. The difference is not really clear to me, it seems that librosa does not give energy, delta and delta-delta values and that the MFCC is done with less intervals but I'm just guessing here because the MFCC part is the part I less understand :-) Anyway the learning results seems ok and I was able to implement an asynchronous loading using tensorflow queuing system which is just great.
What do you think about it ? Eventually would you be able to obtain the 123-dim input vector with librosa ? It would be interesting to compare training with 20-dim against 123-dim input vector...
Thanks, Antoine.