Language model issues - Githubissues

flassTer commented 5 years ago

Hello, I have started inferring audio calls with the DS2 pre-trained model and the WER is very high, almost no word is detected. However, no language model is applied and the sounds of the transcripted text seem to be okay. I have tried downloading a language model by running download_lm.sh in the scripts folder and I have attached a screenshot of the error that pops up? Could you please explain how to solve this?

Picture: https://pasteboard.co/Ig25rOd.png

EDIT: Basically "generate_trie" should be something executable but no such file exists in the directory

Thank you

vsl9 commented 5 years ago

Thanks for the question. We are working on improving the code and the documentation. So it'll be more clear soon. But in short, there are two implementations of beam search decoders with language model rescoring:

TF op (https://github.com/NVIDIA/OpenSeq2Seq/tree/master/ctc_decoder_with_lm). It requires a separate building step (if you are not using NVIDIA TF Docker container). More details here:
- https://nvidia.github.io/OpenSeq2Seq/html/installation.html#pre-built-docker-container
- https://nvidia.github.io/OpenSeq2Seq/html/installation.html#installation-of-openseq2seq-for-speech-recognition This step builds libctc_decoder_with_kenlm.so library for TF op and generate_trie executable for preparing TRIE file. TRIE is required for this decoder.
Python wrapper for Baidu C++ decoder. It can be installed with ./scripts/install_decoders.sh script. This decoder doesn't need a trie file. So feel free to skip generate_trie call in download_lm.sh script.

flassTer commented 5 years ago

Thank you @vsl9 , so after executing that script how do I specify in the python configuration file the language model to be used?

vsl9 commented 5 years ago

There is no need to specify a language model in the config file. Just add it as a command-line argument to decode.py script: https://nvidia.github.io/OpenSeq2Seq/html/speech-recognition.html#decoders

NVIDIA / OpenSeq2Seq

Language model issues #444