NVIDIA / OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
https://nvidia.github.io/OpenSeq2Seq
Apache License 2.0
1.54k stars 369 forks source link

Language model issues #444

Closed flassTer closed 5 years ago

flassTer commented 5 years ago

Hello, I have started inferring audio calls with the DS2 pre-trained model and the WER is very high, almost no word is detected. However, no language model is applied and the sounds of the transcripted text seem to be okay. I have tried downloading a language model by running download_lm.sh in the scripts folder and I have attached a screenshot of the error that pops up? Could you please explain how to solve this?

Picture: https://pasteboard.co/Ig25rOd.png

EDIT: Basically "generate_trie" should be something executable but no such file exists in the directory

Thank you

vsl9 commented 5 years ago

Thanks for the question. We are working on improving the code and the documentation. So it'll be more clear soon. But in short, there are two implementations of beam search decoders with language model rescoring:

  1. TF op (https://github.com/NVIDIA/OpenSeq2Seq/tree/master/ctc_decoder_with_lm). It requires a separate building step (if you are not using NVIDIA TF Docker container). More details here:

  2. Python wrapper for Baidu C++ decoder. It can be installed with ./scripts/install_decoders.sh script. This decoder doesn't need a trie file. So feel free to skip generate_trie call in download_lm.sh script.

flassTer commented 5 years ago

Thank you @vsl9 , so after executing that script how do I specify in the python configuration file the language model to be used?

vsl9 commented 5 years ago

There is no need to specify a language model in the config file. Just add it as a command-line argument to decode.py script: https://nvidia.github.io/OpenSeq2Seq/html/speech-recognition.html#decoders