daanzu / kaldi-active-grammar

Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time
GNU Affero General Public License v3.0
332 stars 49 forks source link
coding command-and-control dictation grammars kaldi kaldi-asr kaldi-grammar python speech-recognition speech-to-text voice voice-coding voice-commands voice-control

Kaldi Active Grammar

Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time

Python package developed to enable context-based command & control of computer applications, as in the Dragonfly speech recognition framework, using the Kaldi automatic speech recognition engine.

PyPI - Version PyPI - Python Version PyPI - Wheel PyPI - Downloads GitHub - Downloads

Batteries-Included Continuous Integration Gitter

Donate Donate Donate Donate

Normally, Kaldi decoding graphs are monolithic, require expensive up-front off-line compilation, and are static during decoding. Kaldi's new grammar framework allows multiple independent grammars with nonterminals, to be compiled separately and stitched together dynamically at decode-time, but all the grammars are always active and capable of being recognized.

This project extends that to allow each grammar/rule to be independently marked as active/inactive dynamically on a per-utterance basis (set at the beginning of each utterance). Dragonfly is then capable of activating only the appropriate grammars for the current environment, resulting in increased accuracy due to fewer possible recognitions. Furthermore, the dictation grammar can be shared between all the command grammars, which can be compiled quickly without needing to include large-vocabulary dictation directly.

See the Changelog for the latest updates.

Features

Demo Video

[![Demo Video](docs/demo_video.png)](https://youtu.be/Qk1mGbIJx3s)

Donations are appreciated to encourage development.

Donate Donate Donate Donate

Related Repositories

Getting Started

Want to get started quickly & easily on Windows? Available under project releases:

Otherwise...

Setup

Requirements:

Installation:

  1. Download compatible generic English Kaldi nnet3 chain model from project releases. Unzip the model and pass the directory path to kaldi-active-grammar constructor.
    • Or use your own model. Standard Kaldi models must be converted to be usable. Conversion can be performed automatically, but this hasn't been fully implemented yet.
  2. Install Python package, which includes necessary Kaldi binaries:
    • The easy way to use kaldi-active-grammar is as a backend to dragonfly, which makes it easy to define grammars and resultant actions.
    • Alternatively, if you only want to use it directly (via a more low level interface), you can just run pip install kaldi-active-grammar
  3. To support automatic generation of pronunciations for unknown words (not in the lexicon), you have two choices:
    • Local generation: Install the g2p_en package with pip install 'kaldi-active-grammar[g2p_en]'
      • The necessary data files are now included in the latest speech models I released with v3.0.0.
    • Online/cloud generation: Install the requests package with pip install 'kaldi-active-grammar[online]' AND pass allow_online_pronunciations=True to Compiler.add_word() or Model.add_word()
    • If both are available, the former is preferentially used.

Troubleshooting

Documentation

Formal documentation is somewhat lacking currently. To see example usage, examine:

The KaldiAG API is fairly low level, but basically: you define a set of grammar rules, then send in audio data, along with a bit mask of which rules are active at the beginning of each utterance, and receive back the recognized rule and text. The easy way is to go through Dragonfly, which makes it easy to define the rules, contexts, and actions.

Building

Contributing

Issues, suggestions, and feature requests are welcome & encouraged. Pull requests are considered, but project structure is in flux.

Donations are appreciated to encourage development.

Donate Donate Donate Donate

Author

License

This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0-or-later). See the LICENSE.txt file for details. If this license is problematic for you, please contact me.

Acknowledgments