Output - Githubissues

chenting0324 commented 6 years ago

Hello, I want to know what is the model output? It's output is phoneme or character or word? And it is an end-to-end model or it is an end-to-end training of acoustic model?

AMairesse commented 6 years ago

Hi, The model output is character. It's an end-to-end model, currently having only an acoustic model so the model is end-to-end and the training of the acoustic model is also done end-to-end.

chenting0324 commented 6 years ago

Thanks! I also want to know how can I get the vector of the output character? In which function can I find the character output？prediction or decode or logits? In fact, what I want to get is the vector of the character.

AMairesse commented 6 years ago

You should look at the _build_base_rnn method :

logits : each char probability for each timestep of the input, for each item of the batch
decoded : different paths found by the CTC beam search decoder, with _log_prob being the cumulative probability of each path
prediction : it's decoded[0] so it's the path with the higher probability

So if you are looking for a list of probabilities for each character of the output it's logits. Ex. : logits[0] will contain a vector of probabilities for each label for the first chunk of audio, logits[1] will be another vector of probabilities for each label of the second chunk of audio, ... This can be challenging to use because there's a lot of vectors. One audio file will have for example 2500 chunks of audio for only 200 characters. So you will have a lot of characters repetition in logits. The CTC algorithm take care of it to build the best paths which give the best cumulated probability.

domerin0 / rnn-speech

Output #50