domerin0 / rnn-speech

Character level speech recognizer using ctc loss with deep rnns in TensorFlow.
MIT License
77 stars 31 forks source link

error in mel-scale computation that results in different outcomes in python and python3 & conversion to the decibel scale #45

Closed alexei-v-ivanov closed 7 years ago

alexei-v-ivanov commented 7 years ago

Hi,

While reviewing your project code I came across a couple of things (see subj) that may benefit from correction.

BTW, is there a specific reason for the audio to be up-sampled to 22050 KHz while reading in? I may think of a few reasons (e.g. bi-linear filters distort high freq., to avoid imaging one needs to cut the band with low pass, which leaves a gap, or must be very sharp, i.e. high-order, etc.). However, that practice is rather unusual.

I'm new to Github. If you see that I'm doing things in a wrong way,- please, do not hesitate to point it out.

Best! AI

AMairesse commented 7 years ago

Hi,

Thanks for this pull request, this part of the code is from Dominik ( @inikdom ) and I've never really looked into it. I agree with your modifications but I'm not able to answer your question about the up-sample. Feel free to submit pull requests, for example adding comments to this part would be a great idea in order to better understand what is being done in each part of the function.

Thanks, Antoine.