SeanNaren / deepspeech.torch

Speech Recognition using DeepSpeech2 network and the CTC activation function.
MIT License
260 stars 73 forks source link

Regarding the Language Model used #85

Open SarthakYadav opened 7 years ago

SarthakYadav commented 7 years ago

DeepSpeech paper [1] says that at inference time CTC models are paired with a language model.

Which language model does this implementation use? Where is the language model written/stored/called in the code?

How can I use my own language model with the network?

mtanana commented 7 years ago

This is equivalent to the implementation before adding the language model. If you wanted to replicate the paper, you would spit out the top 1000 beams from the DS model and rescore them with a LM. (This would take a bit of extra work of course...)

SarthakYadav commented 7 years ago

Yeah I "actually" went through the code this time and already realized there isn't an LM.

Oh. That sounds alright. Can you give me some pointers on how to

  1. spit the top 1000 beams? (I am new to Torch.)
  2. Is there any pre-built LM for torch / interface to some existing Language model, say, kenlm??
mtanana commented 7 years ago
  1. This should be easier than other problems because you can just take the outcome probabilities and walk through them saving the top 1000 at each step. (This will be a bit like doing Viterbi search if you're familiar...) In other words, you won't have to ever re-run the lower parts of the model, since each time t doesn't depend on the output for t-1

  2. There are some neural LM's for torch https://github.com/mtanana/torchneuralconvo https://github.com/karpathy/char-rnn but these will be a lot slower to run over 1000 examples than an n-gram model, which is was the DS paper used. But you'd have to write a crosswalk to someone's code...

mtanana commented 7 years ago

And note...the paper mentioned some weighting between the score from the DS model and the score from the LM....wasn't clear if this was estimated or set like a hyper-param

mtanana commented 7 years ago

I'm still playing with the base model code, but once I get better results, I'd be happy to help with this part...but I'm a month or two from where I'll have time...

SeanNaren commented 7 years ago

I won't be able to implement the language model due to time constraints but it definitely is a large part to the project and would improve the model's performance substantially. A lot of reference can be found in the original deepscribe 2 paper

menamine commented 6 years ago

Dear all, I am trying to integrate the LM in the calculation of final score estimated by the neural network. I am new to Torch, could anyone give me a start point to get the outcome probabilities from the neural network? Thanks in advance.