Closed zenogantner closed 4 years ago
https://github.com/SeanNaren/deepspeech.pytorch/issues/261
I ran into the same problem. Uppercasing all ngrams in the lm .arpa file and rebuilding the binary resolved this issue. Note: Do not upper case the \ characters.
just train your own LM with kenlm (very fast to train and relatively straightforward) with the training data, it should work then.
You can download pretrained ones from: http://www.openslr.org/11/
Or more general models: http://www.keithv.com/software/giga/
Coming back to the segfault: even if a wrong LM is provided, maybe the software should not segfault?
KenLM doesn't guarantee compatibility with old binaries. The segfault doesn't happen in this repo, but most likely in WarpCTC or KenLM.
If you train your own LM with LIbriSpeech it will work. I get the same segfaults as you with those files
@miguelvr is correct. Alternatively if you keep your LMs in ARPA format, you can always convert that to a kenlm binary format with matched versions.
@ryanleary I'm not too sure about that. Why does it get a seg fault when you load an arpa LM file then?
Would it make sense to link the language models from https://github.com/SeanNaren/deepspeech.pytorch/releases ? I guess this would be useful for others as well, not just me.
FYI. if use Mozilla Deepspeech release 0.2.0's Language Model. the WER may reduced from 10.2 -> 7.0
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
python3 transcribe.py --model-path models/librispeech_pretrained.pth --audio-path my-recording.wav --decoder beam --lm-path ../DeepSpeech/data/lm/lm.binary
Mozilla Deepspeech also uses KenLM language models. What am I doing wrong? If the file formats are incompatible, the failure could be a bit more graceful.
Let me know if you need further info.
Are there any prepared language models that I could try out?