domerin0 / rnn-speech

Character level speech recognizer using ctc loss with deep rnns in TensorFlow.
MIT License
77 stars 31 forks source link

Output #50

Open chenting0324 opened 6 years ago

chenting0324 commented 6 years ago

Hello, I want to know what is the model output? It's output is phoneme or character or word? And it is an end-to-end model or it is an end-to-end training of acoustic model?

AMairesse commented 6 years ago

Hi, The model output is character. It's an end-to-end model, currently having only an acoustic model so the model is end-to-end and the training of the acoustic model is also done end-to-end.

chenting0324 commented 6 years ago

Thanks! I also want to know how can I get the vector of the output character? In which function can I find the character output?prediction or decode or logits? In fact, what I want to get is the vector of the character.

AMairesse commented 6 years ago

You should look at the _build_base_rnn method :

So if you are looking for a list of probabilities for each character of the output it's logits. Ex. : logits[0] will contain a vector of probabilities for each label for the first chunk of audio, logits[1] will be another vector of probabilities for each label of the second chunk of audio, ... This can be challenging to use because there's a lot of vectors. One audio file will have for example 2500 chunks of audio for only 200 characters. So you will have a lot of characters repetition in logits. The CTC algorithm take care of it to build the best paths which give the best cumulated probability.