Uberi / speech_recognition

Speech recognition module for Python, supporting several engines and APIs, online and offline.
https://pypi.python.org/pypi/SpeechRecognition/
BSD 3-Clause "New" or "Revised" License
8.3k stars 2.39k forks source link

sphinx_lm_convert -i chinese.lm -o chinese.lm.bin error on Mac X EI Capitan 10.11.6 #238

Closed lyxminnie closed 7 years ago

lyxminnie commented 7 years ago

Steps to reproduce

  1. ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" < /dev/null 2> /dev/null
  2. brew install cmu-sphinxbase
  3. sphinx_lm_convert -i zh_broadcastnews_64000_utf8.DMP -o chinese.lm -ofmt arpa
  4. sphinx_lm_convert -i chinese.lm -o chinese.lm.bin

Expected behaviour

step 1-3 succeeded. step 4 should generate chinese.lm.bin.

Actual behaviour

sphinx_lm_convert -i chinese.lm -o chinese.lm.bin INFO: cmd_ln.c(691): Parsing command line: sphinx_lm_convert \ -i chinese.lm \ -o chinese.lm.bin

Current configuration: [NAME] [DEFLT] [VALUE] -case -debug 0 -help no no -i chinese.lm -ienc -ifmt -logbase 1.0001 1.000100e+00 -mmap no no -o chinese.lm.bin -oenc utf8 utf8 -ofmt

INFO: ngram_model_arpa.c(477): ngrams 1=63944, 2=16600781, 3=20708460 INFO: ngram_model_arpa.c(135): Reading unigrams INFO: ngram_model_arpa.c(516): 63944 = #unigrams created INFO: ngram_model_arpa.c(195): Reading bigrams .............................................................................................................................................................................................................................................................INFO: ngram_model_arpa.c(533): 16600781 = #bigrams created INFO: ngram_model_arpa.c(534): 32337 = #prob2 entries INFO: ngram_model_arpa.c(542): 24468 = #bo_wt2 entries INFO: ngram_model_arpa.c(292): Reading trigrams ...........................................................................................................................................................................................................................................................................................................................INFO: ngram_model_arpa.c(555): 20708460 = #trigrams created INFO: ngram_model_arpa.c(556): 27937 = #prob3 entries ERROR: "ngram_model.c", line 183: language model file type not supported ERROR: "sphinx_lm_convert.c", line 212: Failed to write language model in format (null) to chinese.lm.bin

System information

(Delete all the statements that don't apply.)

My system is <Mac OS X EI Capitan 10.11.6>. (For example, "Ubuntu 16.04 LTS x64", "Windows 10 x64", or "macOS Sierra".)

My Python version is <Python 3.5.2 :: Anaconda custom (x86_64)>. (You can check this by running python -V.)

My Pip version is <pip 8.1.1 from /anaconda/lib/python3.5/site-packages (python 3.5)>. (You can check this by running pip -V.)

My SpeechRecognition library version is <3.6.5>. (You can check this by running python -c "import speech_recognition as sr;print(sr.__version__)".)

I installed PocketSphinx from . (For example, from the Debian repositories, from Homebrew, or from the source code.)

Uberi commented 7 years ago

Hi @lyxminnie,

It seems like you're installing software from Homebrew. The error messages suggest that the SphinxBase version in use is quite old, since it doesn't support that newer format.

For SphinxBase, Homebrew has version 0.8, which is from 2012. This is why the docs outline how to build SphinxBase from source. If you follow those instructions instead, it should be possible to build the model.

Please re-open this issue if the errors persist! I am planning to eventually provide a Docker image with all of these tools to simplify model building.

lyxminnie commented 7 years ago

I tried to install from the source code, and it works! Thank you!

easyash commented 6 years ago

Hi,

I am getting the same error as well. I have built sphinxbase from source code as given the docs. My system is also Mac OS . INFO: ngram_model_arpa.c(292): Reading trigrams INFO: ngram_model_arpa.c(555): 21488 = #trigrams created INFO: ngram_model_arpa.c(556): 144 = #prob3 entries ERROR: "ngram_model.c", line 183: language model file type not supported ERROR: "sphinx_lm_convert.c", line 212: Failed to write language model in format (null) to hindi.lm.bin

which sphinx_lm_convert /usr/local/bin/sphinx_lm_convert

Please advise.

easyash commented 6 years ago

Hi,

I was able to get this working. Had to remove --force from the command to build and make.

Regards, Ashok