NaomiProject / Naomi

The Naomi Project is an open source, technology agnostic platform for developing always-on, voice-controlled applications!
https://projectnaomi.com/
MIT License
266 stars 60 forks source link

Use JSGF for writing intents #363

Open aaronchantrill opened 2 years ago

aaronchantrill commented 2 years ago

Detailed Description

Like most Voice Assistants, Naomi's intents serve multiple purposes. First, they are used by the speech to text system to prepare a dictionary of words to recognize. Next they are converted into a language model to help the speech to text system guess what it is most likely hearing given the likelihoods of different arrangements of words. Finally, it is used by the text to intent system to figure out which intent to trigger.

When developing the format for creating grammars for Naomi speechhandler plugins, I created a structure format where the grammar is split into keywords and phrases, with keywords providing a list of options in a phrase. This was similar to the way grammars are constructed for intent parsing systems I have looked at and was a simple way to move Naomi from simply spotting keywords to reacting to more complex utterances, but has a few big problems: 1) in order to generate a new grammar for a plugin you have to edit the plugin 2) it is not particularly robust, and we would like to have additional keyword types such as numbers and dates to help developers 3) it is not standard, and a developer learning to generate a grammar for Naomi is not learning skills that will translate to other projects

There are a few grammar formats out there; JSGF, Nuance, ANTLR, SRGS, etc. SRGS seems to be a W3C specification, but I see very little support for it, and there is also a W3C specification for JSGF. JSGF has been around a long time and there is a pyJSGF library on PyPI which could be helpful. DeepSpeech/Coqui can use JSGF files directly, so I propose that we use JSGF grammar format for building Naomi intents, unless someone has a reason to prefer a different format.

Context

Possible Implementation

Your Environment

aaronchantrill commented 1 year ago

I'm evaluating Synesthesiam's jsgf2fst for this purpose.

aaronchantrill commented 1 year ago

I think that SRGS is going to end up being a better choice. It's a bear to write and not very intuitive, but it is standard and I get the feeling that we will be seeing more of it in the future. It also supports named slots, which JSGF does not, making it more appropriate for writing intent templates than JSGF. It can also be used to provide lists of different ways of saying things so that Naomi can generate semi-random responses, which is one of the things I wanted to use JSGF for also. This is also a place where it seems like we could have an impact, since there currently does not appear to be a Python package for parsing SRGS files.