domerin0 / rnn-speech

Character level speech recognizer using ctc loss with deep rnns in TensorFlow.
MIT License
77 stars 31 forks source link

beam search with RNN-LM #51

Open cyu0913 opened 6 years ago

cyu0913 commented 6 years ago

This is great project!

I'm especially interested in the part combining CTC beam search with RNN-LM. May I ask what's the progress on this part? Will it be ready soon :-) ?

AMairesse commented 6 years ago

Hi, thanks a lot !

Unfortunately I'm not working much on it for the moment, I've been having a look on another project Deepspeech.pytorch which is having better results (with an acoustic model only like this model). Deepspeech is based on a newer model (DeepSpeech 2) which is bi-directional but also support an mono-directional mode. Mono-directional is better suited for real-time transcription and what I was looking into with rnn-speech.

Anyway about rrn-speech : combining the probabilities of each models (acoustic and language) is quite the challenge and it would require some dedicated time that I can't afford for the moment. If you want to take a look you should start from the dev branch. I've laid the ground for the language model, but there is still some work to make it train correctly. Then the work will be to combine the probabilities before running the CTC beam search, the idea is to use the SpeechRecognizer class to do the work by using AcousticModel and LanguageModel. You're welcome to look into it, I'll help you as much as I can if you have any question.

Note : the dev branch isn't merged yet because the pre-trained model is not compatible with it. I tried to build a new pre-trained model but so far I can't get one as good as the current one. Not far from it but still...:-)

cyu0913 commented 6 years ago

I understand the complexity of implementing this, especially in Tensorflow :-). Should be much easier in Pytorch. But, thank you anyway!