cmusphinx / pocketsphinx

A small speech recognizer
Other
3.95k stars 719 forks source link
c python speech-recognition

PocketSphinx 5.0.3

This is PocketSphinx, one of Carnegie Mellon University's open source large vocabulary, speaker-independent continuous speech recognition engines.

Although this was at one point a research system, active development has largely ceased and it has become very, very far from the state of the art. I am making a release, because people are nonetheless using it, and there are a number of historical errors in the build system and API which needed to be corrected.

The version number is strangely large because there was a "release" that people are using called 5prealpha, and we will use proper semantic versioning from now on.

Please see the LICENSE file for terms of use.

Installation

We now use CMake for building, which should give reasonable results across Linux and Windows. Not certain about Mac OS X because I don't have one of those. In addition, the audio library, which never really built or worked correctly on any platform at all, has simply been removed.

There is no longer any dependency on SphinxBase. There is no SphinxBase anymore. This is not the SphinxBase you're looking for. All your SphinxBase are belong to us.

To install the Python module in a virtual environment (replace ~/ve_pocketsphinx with the virtual environment you wish to create), from the top level directory:

python3 -m venv ~/ve_pocketsphinx
. ~/ve_pocketsphinx/bin/activate
pip install .

To install the C library and bindings (assuming you have access to /usr/local - if not, use -DCMAKE_INSTALL_PREFIX to set a different prefix in the first cmake command below):

cmake -S . -B build
cmake --build build
cmake --build build --target install

Usage

The pocketsphinx command-line program reads single-channel 16-bit PCM audio from standard input or one or more files, and attemps to recognize speech in it using the default acoustic and language model. It accepts a large number of options which you probably don't care about, a command which defaults to live, and one or more inputs (except in align mode), or - to read from standard input.

If you have a single-channel WAV file called "speech.wav" and you want to recognize speech in it, you can try doing this (the results may not be wonderful):

pocketsphinx single speech.wav

If your input is in some other format I suggest converting it with sox as described below.

The commands are as follows:

By default only errors are printed to standard error, but if you want more information you can pass -loglevel INFO. Partial results are not printed, maybe they will be in the future, but don't hold your breath.

Programming

For programming, see the examples directory for a number of examples of using the library from C and Python. You can also read the documentation for the Python API or the C API

Authors

PocketSphinx is ultimately based on Sphinx-II which in turn was based on some older systems at Carnegie Mellon University, which were released as free software under a BSD-like license thanks to the efforts of Kevin Lenzo. Much of the decoder in particular was written by Ravishankar Mosur (look for "rkm" in the comments), but various other people contributed as well, see the AUTHORS file for more details.

David Huggins-Daines (the author of this document) is responsible for creating PocketSphinx which added various speed and memory optimizations, fixed-point computation, JSGF support, portability to various platforms, and a somewhat coherent API. He then disappeared for a while.

Nickolay Shmyrev took over maintenance for quite a long time afterwards, and a lot of code was contributed by Alexander Solovets, Vyacheslav Klimkov, and others.

Currently this is maintained by David Huggins-Daines again.