dictation-toolbox / Caster

Dragonfly-Based Voice Programming and Accessibility Toolkit
Other
337 stars 122 forks source link

Add hold and release #850

Closed kendonB closed 3 years ago

kendonB commented 3 years ago

Add hold and release

Description

This PR adds hold and release for control, shift, and alt. It uses the existing caster specs (e.g. fly | control). It also adds those keys (along with the windows key) to button_dictionary_1.

Related Issue

None.

Motivation and Context

Pressing the modifier keys was missing from Caster (it is available natively in Dragon using the prefix "press"). Holding and releasing is sometimes desired for these keys as well. If others have use cases for holding and releasing other keys (like "hold lease") then we can consider adding those as well.

How Has This Been Tested

Tested using the test engine for each new command.

Types of changes

Checklist

Maintainer/Reviewer Checklist

kendonB commented 3 years ago

I'd like to add docs for this but wanted to run it by you @LexiconCode before going into it. Given we had planned to move the quick reference pdf into html pages on rtd, the documentation for this feature would ideally be a page for key presses including both alphabet and punctuation. We could go straight to that in this PR, or put in a temporary page which just describes the modifier buttons (fly, alt, windows, shift, and combos + "hit:; currently undocumented) as well as the hold and release from this PR. Happy with either or deferring docs until later.

LexiconCode commented 3 years ago

The keyboard emulation commands should be moved into their own grammar out of navigation. In part because of complexity and naming schemes. We could simply merge this with a tweak to the docks as is or work towards PR together separating out those grammars and documentation. Do you have a preference?

kendonB commented 3 years ago

Let's push ahead with the rewrite of those keyboard emulation grammars in this (or another) pull request.

So this will merge alphabet.py, punctuation.py, perhaps numeric.py, and pull the modifier functionality out of nav.py?

This would be somewhat of a breaking change, because users can currently disable these individually. I personally use all of them all the time, but you can imagine situations where you might want to disable them individually to avoid misrecognitions. I don't know that anyone is actually disabling these individually, but something to consider whether we want it to be possible or not.

The alternative is to create a new grammar, pulling the modifier functionality out of nav.py.

kendonB commented 3 years ago

Another option which I think I prefer is to merge the pure keypress grammars alphabet.py and punctuation.py but include the 10 individual number keys as keys in that new grammar. Then leave numeric.py to handle typing numbers.

LexiconCode commented 3 years ago

Another option which I think I prefer is to merge the pure keypress grammars alphabet.py and punctuation.py but include the 10 individual number keys as keys in that new grammar. Then leave numeric.py to handle typing numbers.

It would be nice to reduce the number of but I think it comes at the expense of clear divisions between when people search for functionality for rules and listing the Rule through the status window. Moving 10 individual number keys and expanding out to make sure it covers the standard keyboard and optionally mac os. How would we include HTML in the docs?

kendonB commented 3 years ago

So you're proposing numeric.py be incorporated into the new keyboard.py?

The docs are already in html! I just mean readthedocs webpage (easy to manage) vs the bloated, difficult to manage quick reference pdf.

LexiconCode commented 3 years ago

So you're proposing numeric.py be incorporated into the new keyboard.py?

I would say separate. The keyboard emulation "keyboard.py" is not really meant for every day dictation. The "alphabet.py", "punctuation.py" and "numeric.py" covers most people's use cases. Keyboard emulation is a backup to give people total access. Think of it like this if "keyboard.py" is consistently being used within another application/context then there is a missing command that should be implemented or if there is no rule one needs to be made.

if we were to change the scope of keys included maybe can call maybe modifierkeys.py?

Another way to go about it would be if "keyboard.py" is active then "alphabet.py", "punctuation.py" and "numeric.py" grammars deactivate.

kendonB commented 3 years ago

Another way to go about it would be if "keyboard.py" is active then "alphabet.py", "punctuation.py" and "numeric.py" grammars deactivate.

I think we can have both at the same time. They won't be any duplication that uses exactly the same spec.

Let me know what you think about this plan:

I think we agree that users should have the ability to press, hold, and release any key on a normal keyboard, potentially modified by shift, ctrl, alt, or super/windows (or combinations thereof).

So this is:

  1. 26 alphabet keys
    • alphabet.py for normal and shift
    • nav.py for all modifiers (shift is duplicated)
  2. 10 digits
    • numeric.py for normal
    • nav.py for all modifiers
  3. 32(ish) common punctuation characters
    • punctuation.pyfor normal
    • nav.py for all modifiers.
      • Some of these don't actually exist on the original keyboard and are thus redundant e.g. Shift+*
      • Some of these are effectively duplicated because one is the shifted version of another key. e.g. Ctrl+& is the same as Ctrl+Shift+7
  4. Common action keys (Esc, F1-F12, Print Screen, Delete, Insert, Backspace, Home, Tab, End, Caps Lock, Enter, PageDown, PageUp, Alt, Control, Shift, Windows/Super, left, right, up, down, Space)
    • Some of these are in nav.py and punctuation.py has Space
    • Anything that's actually implemented will have modifiers in nav.py.
  5. other less common keys (Num Lock, Scr Lock, Pause Break, Right Alt, Right Control, Right Shift, others)

I think the development philosophy should be that we "start" with a keyboard.py that just contains everything. To avoid misrecognitions, every keypress spec requires a prefix, even just to "hit" a key.

Then some of these key presses get exposed in other grammars for special/easier use:

LexiconCode commented 3 years ago

I think we agree that users should have the ability to press, hold, and release any key on a normal keyboard, potentially modified by shift, ctrl, alt, or super/windows (or combinations thereof).

Yes I agree!

I think we can have both at the same time. They won't be any duplication that uses exactly the same spec.

It's not just about duplication of specs more importantly it's about increased grammar complexity. Remember "alphabet.py", "punctuation.py" and "numeric.py" are CCR. Adding modifiers to all of them would increase complexity more so than now. Complexity on its own is an issue however it also leads to decreased accuracy more so in kaldi even with "keyboard.py" as non ccr.

~~When thinking about complexity think of the basic modifiers alt, control, shift. That's 3. The digits is 10. 1 x 10 + 2 x 10 + 3 x 10 = 50. If I did my math right represented grammar complexity correctly numeric.py has just become 4 times more complex by adding shift modifiers which doesn't account for the CCR complexity.~~ Edit: there's a better way to empirically test this, will post a bit later.

I think the development philosophy should be that we "start" with a keyboard.py that just contains everything. To avoid misrecognitions, every keypress spec requires a prefix, even just to "hit" a key.

Yes I can see what you're saying there and it makes sense "start" with a keyboard.py for developing a new grammar. That's why I thought "keyboard.py" is active then "alphabet.py", "punctuation.py" and "numeric.py" grammars deactivate. would allow for that design philosophy yet reduce the complexity. This keeps "keyboard.py" from increasing complexity during everyday use.

LexiconCode commented 3 years ago

Testing methodology testing rules merged into CCR 1.Delete settings folder in the user directory (backup if necessary)

2.Extract and place in user rules grammar_complexity.zip

  1. restart Caster
  2. say "enable complexity rule"
  3. say "list complexity score"
  4. look for Grammar(ccr-1) Name: 'Repeater1' and copy it score
  5. remove/comment out commands from rule (in this case nav.py all under # keystroke commands) and save
  6. say "list complexity score"
  7. compare scores
Grammar(ccr-1)
 Name: 'Repeater1'
  Score: 298729
Grammar(ccr-2)
 Name: 'Repeater2'
  Score: 144721

By removing # keystroke commands from Nav.py we re-reduce the overall complexity of Caster CCR by half down to 154,008 with the default starting grammars.

kendonB commented 3 years ago

@LexiconCode sorry I've been unresponsive here.

Yes I can see what you're saying there and it makes sense "start" with a keyboard.py for developing a new grammar. That's why I thought "keyboard.py" is active then "alphabet.py", "punctuation.py" and "numeric.py" grammars deactivate. would allow for that design philosophy yet reduce the complexity. This keeps "keyboard.py" from increasing complexity during everyday use.

We could do this for free were it not for the extra bells and whistles in punctuation.py and numeric.py. e.g. being able to say "numb one thousand" and "boom".

The complexity issue may not be a problem in practice. Even with my enormous R CCR grammar everything still works well in Kaldi and worked well in DPI 15. I'm of course open to benchmarking but CCR seems to not be a barrier on my system. Users can always limit the number of repetitions as well, given that the default is quite high. Or we could lower the default.

One thing we can do to limit duplications is just to delete elements from the keyboard grammar that also are covered by other grammars. So that would be the basic key presses that are covered elsewhere. We could also limit duplications deliberately for things like Ctrl-* which is the same as Ctrl-Shift-8.

kendonB commented 3 years ago

closing in favour of #859