dictation-toolbox / Caster

Dragonfly-Based Voice Programming and Accessibility Toolkit
Other
339 stars 122 forks source link

Overhaul grammars to use plain English based on Human-Machine Grammar. #625

Open LexiconCode opened 5 years ago

LexiconCode commented 5 years ago

As it stands currently Caster has relied on phonetically distinct and obscure words in its grammars. Traditionally this has led to higher accuracy of command recognition. However that is a little less relevant than it was in 2015 when the majority of Caster was written. The previous statement isn't valid when taking into consideration other speech recognition which may not be as advanced as DNS.

Drawbacks of the current grammar set

The proposal

chilimangoes commented 5 years ago

I'm very much in favor of this and have been thinking of how to propose something similar for some time. My proposal would go further to suggest that we reach out to the larger voice coding community to try to work together to come up with a (relatively) standard grammar that works across multiple voice coding frameworks. My intuition is that this is one of the things that could help get people to get on the voice control train.

A while ago, on the dictation toolbox general channel, Ryan Hileman was lamenting the fact that it's hard to get people to take voice coding seriously. I think there are a few reasons for that, and I think one of them is how fractured the voice coding community is. There must be at least a half dozen different "languages" for coding by voice. Many of them are loosely based the grammar used in Tavis Rudd's video which, while practical and an inspiration, was also cobbled together and disjointed. This tower of Babel situation makes it hard for newcomers to take the leap because, before they can even dip their toes in the water, they first need to try to decide which of the dialects to try. There's a psychological phenomenon that's fairly well-known in marketing and business circles where having too many choices actually causes most people to simply opt out altogether due to indecision. IMO, voice coding needs an equivalent of a QWERTY (or Dvorak) standard, if we ever hope to gain any kind of momentum.

My vote would be to at least start with Kim's grammar as she has put a tremendous amount of time and effort into designing it and she has some very good ideas and convincing arguments. I also think it would be a good idea for anyone interested in this to read through the content on the ShortTalk web site because Nils Klarlund also had some brilliant ideas about voice-computer interaction, even if his ideas would be technically and behaviorally more difficult to implement than Kim's.

mrob95 commented 5 years ago

I agree that having a custom language using made up words, while potentially more efficient, is a major barrier to entry that I think we could do without. I think most people will find that as they get comfortable with a standard grammar, they will want to tune it and make it more concise for the tasks they need to accomplish, but this will be done gradually so the learning curve will not be too bad and they will understand the trade-offs. There is a big difference between a user deciding that e.g. "sauce" might be better than "up" and implementing that themselves, and a project saying "here is a command grammar with 100 made up words which you have to learn before you can get anything done".

I think in the majority of cases it will be fairly clear what the obvious spec for a command should be, without needing to think too hard about human psychology etc. Good places to start would probably be:

One interesting comparison would be to test different formulations on something like https://www.typingtest.com/. With a highly customised phonetic alphabet I am getting around 27 WPM saying one character at a time.

alexboche commented 5 years ago

One downside to making commands be regular English words is that that would make it more difficult to do to CCR between commands and dictation (see #623 )

LexiconCode commented 5 years ago

One downside to making commands be regular English words is that that would make it more difficult to do to CCR between commands and dictation (see #623 )

That's true, we will have to change how we structure the grammars to accommodate. This really can't begin until the CCR rewrite is complete.

It seems like the majority of the community is in favor of this. How should we handle the current grammar?

Starting off I think we can simply reference a commit to the last old grammar style. Medium-term building up filter rules soon to be Transformers to utilize the old style grammar set. This would be a great way to test out and demonstrate using Transformers.

LexiconCode commented 4 years ago

As a side note transformers have been relegated only to simplified filter rules. They didn't really work out in practice at least for the end-user modifications. Transformers would need a higher level of abstraction to be user-friendly.

LexiconCode commented 4 years ago

Some sort of a grammar API is needed. I have some thoughts on higher level abstraction

LexiconCode commented 4 years ago

Greetings All, I've been in contact with Kim over the phone and we had a wonderful conversation. She really does have a wonderful philosophy and methodology to build grammars. Kim has expressly given us permission to utilize her work Human-Machine Grammar in Caster as long as the project stays open source.

Pursue permissions from Kim the author to utilize Human-Machine Grammar Dictionary as a basis for redeveloping caster Grammars. - A third-party review of grammar system