Transitioning continuously between dictation and commands

alexboche commented 5 years ago

Even though you can transition from commands to dictation by using a command like "format <dictation>", there is currently no way to transition from dictation to commands without pausing. It would be nice if you didn't have to pause. Allowing too many commands to be active while dictating would cause misfires, but it would be good to have the ability to say a few commands that are frequently used while dictating. an example of such a command would be moving to the left/right n words. Such commands would probably need to have names that are not similar to any commonly used words.

One way to do this is to filter all commands through a ccr command using a text action like "<dictation>": Text("%(dictation)s"). (there is probably a better way, but I have tested this briefly and it works.) LexiconCode mentioned that it is possible to make separate CCR grammars that are not merged with each other. So there could be a separate CCR grammar that has the dictation command just mentioned and a small number of other commands for moving the cursor and so on. (I have not attempted to test this part.)

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/76337179-transitioning-continuously-between-dictation-and-commands?utm_campaign=plugin&utm_content=tracker%2F1825907&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F1825907&utm_medium=issues&utm_source=github).

LexiconCode commented 5 years ago

That sounds useful. However things are about to change with how we do CCRMerging.

LexiconCode mentioned that it is possible to make separate CCR grammars that are not merged with each other. So there could be a separate CCR grammar that has the dictation command just mentioned and a small number of other commands for moving the cursor and so on.

To clarify, So this would be a CCR grammar that's not merged into the global grammar? Are you thinking about an app specific CCR?

chilimangoes commented 5 years ago

I think what @alexboche is talking about is, essentially, being able to mix dictation and CCR commands. AFAIK, this is strictly not possible with Dragon, at least not with the way that Dragonfly grammars currently work. The only way to do it would be if Caster were to not use Dragon's command mode at all, capture ALL dictation output from Dragon, and then parse it to try to figure out which words are commands and which are dictation. This is how Nils Klarlund's ShortTalk worked. It can work, but comes with a lot of challenges, both technical and behavioral (such as your command words need to be completely foreign, not just obscure, like Ross, lease, sauce, dunce, etc). As much as I would love to be able to mix commands in with dictation, I feel like the amount of work it would require just isn't worth it.

alexboche commented 5 years ago

To clarify, So this would be a CCR grammar that's not merged into the global grammar? Are you thinking about an app specific CCR?

I was thinking a separate CCR grammar that is not merged into the main global CCR grammar.the separate CCR grammar could be global or context specific (definitely something that could be enabled and disabled). This separate ccr grammar would just have a small number of commands in it, ideally with words that are not in the Dragon vocabulary. the command would basically be command for text navigation, e.g. arrow keys (with modifiers), maybe the clipboard based text navigation commands. it might be possible to use a mimic command to get the native Dragon text manipulation commands to work CCR with respect to words preceding them (not words following them). Ryan seems to have made some headway on this in Talon,but seems inconsistent so far. not sure how his system works; it may work at a somewhat lower level than just forcing dictation through a command. They also has the ability to force words to be dictation by saying "word ". (Dragon also comes with hotkeys for forcing dictation versus commands; I think you can set them in Dragon settings. So might be worth looking into if there's a way to access the functionality at a lower level.)

Might not be worth the effort. Just something to think about. If you feel like this is a no go, feel free to close the issue.

Alternatively, it might be good to make some more commands like caster's "format " to make it easier to get the proper spacing before and after,get punctuation and capitalization etc. E.g. perhaps a command like "sentence ", "spay " (puts a space before the dictation), "comma list " (puts commas between all the words)

LexiconCode commented 5 years ago

@alexboche Thanks for clarifying the scenario with CCR. @chilimangoes Interesting I wonder if @quintijn would have some knowledge with Natlink that could be leveraged to make this happen in DNS without postprocessing free dictation?

LexiconCode commented 5 years ago

@alexboche

One way to do this is to filter all commands through a ccr command using a text action like "<dictation>": Text("%(dictation)s")

Could you include a demo grammar with implementation with a few other commands mixed in?

mrob95 commented 5 years ago

I may be misunderstanding, but isn't this what the brunt keyword is for in:

"(<capitalization> <spacing> | <capitalization> | <spacing>) [(bow|bowel)] <textnv> [brunt]":
    R(Function(textformat.master_format_text)),

Ending the dictation and transitioning back to commands, e.g. laws bow this is a test brunt queue four deli

chilimangoes commented 5 years ago

@LexiconCode I seem to remember reading somewhere that this is a fundamental limitation of the functionality that is exposed to Natlink by Dragon, although I could be wrong.

@mrob95 the way I understood the proposal was, he would like to be able to say something like "Mary had a little lamb comma who's fleece was blue queue lease white as snow" as a single utterance, and without the initial "format" keyword that makes Dragon recognize it as a command, and have the following output:

Mary had a little lamb comma who's fleece was blue

Followed by selecting the word blue and replacing it with "white as snow" to get the following:

Mary had a little lamb comma who's fleece was white as snow

After thinking about this overnight, I might be warming to the, if it's not too buggy. Dragon would essentially be in permanent command mode at that point, since you would have a catch-all rule. This might even need to be the default way we process dictation, once some of the open source speech backends get to the point where they're serious alternatives to Dragon. However, in order to more intelligently handle spacing and capitalization when doing standard dictation, we would want to use the new accessibility based text manipulation in Dragonfly rather than the blind output Caster currently does. Although, TBH, I think this last bit is the right direction anyway.

LexiconCode commented 5 years ago

However, in order to more intelligently handle spacing and capitalization when doing standard dictation, we would want to use the new accessibility based text manipulation in Dragonfly rather than the blind output Caster currently does. Although, TBH, I think this last bit is the right direction anyway.

That is the future and will drastically simplify the process of existing manipulating text. This pull request gives people the chance to utilize it with caster. However I closed because it wasn't simple way to integrate dependencies.

synkarius commented 5 years ago

I was thinking a separate CCR grammar that is not merged into the main global CCR grammar.

@alexboche I have decoupled the new CCRMerger from the rules it produces. So shortly, there will be nothing to stop you from making another CCRMerger instance and making more "global" type rules.

alexboche commented 5 years ago

So here's a (Edit: new an simpler) sample grammar provided by David Zurow. This will work if you just put it right into a new file in caster (or anywhere in macro system I think). This would probably not be merged with other grammars.

from dragonfly import *
class CommandRule(MappingRule):
    mapping = {
        "splat [<n>]":Key("c-backspace:%(n)s"),
        "fly lease [<n>]": Key("c-left:%(n)s"),
        "fly ross [<n>]": Key("c-right:%(n)s"),
        "lease [<n>]": Key("left:%(n)s"),
        "ross [<n>]": Key("right:%(n)s"),

        }
    extras = [IntegerRef("n", 1, 10)]
    defaults = {"n":1}
command_rule = CommandRule()
class DictationRule(MappingRule):
    mapping = {
        "<dictation>": Text("%(dictation)s "), # adding a trailing space
    }
    extras = [ Dictation("dictation") ]
dictation_rule = DictationRule()

dict_cmd_sequence = Repetition(Alternative([RuleRef(dictation_rule), RuleRef(command_rule)]),
    min=1, max=10, name="dict_cmd_sequence")

class SequenceRule(CompoundRule):
    spec = "<dict_cmd_sequence>"
    extras = [dict_cmd_sequence]
    def _process_recognition(self, node, extras):
        for action in extras["dict_cmd_sequence"]:
                action.execute()

grammar = Grammar("zurow")
sequence_rule = SequenceRule()
grammar.add_rule(sequence_rule)
grammar.load()

I'm going to be testing this out in the wild of it to see how it goes.

A first issue with this is that when this grammar is active, all dictation is passed-through it which I think means that it does not have lower priority then other commands (unless we tweak the priorities ourselves). This contrast the usual situation in Dragon where commands are prioritized over dictation. We should probably figure out how to manually adjust the priorities of different grammars because e.g. we might want to turn down the priority of this grammar so that commands do not instead get accidentally interpreted as dictation, but this doesn't seem to be much of a problem so far (will have to monitor it) .

A second issue is spacing (and perhaps other formatting) . With this approach, because dragon's dictation is passing through text action, some of the formatting in the Dragon vocabulary seems to sometimes but not always get lost in translation. I don't quite understand how this works yet. here are two ways that this spacing issue manifests. A) At the beginning of every utterance, Dragon seems to somehow leave an invisible space at the end of the last word of each dictation utterance or something but that does not work with this text action approach which is why I added the space at the end of the text action. I don't quite understand how Dragon does this but I I did a little fiddling and noticed that although when you dictate a phrase and then pause and then dictate another phrase it will put the proper space in between the two phrases, if after the first phrase you click on a different part of the text and then click back to the end of the first phrase before dictating the second phrase, no space will be placed in between the phrases. Having the space at the end of the tax action should mostly mitigate this issue, though eventually a better solution would be good. B) In the properties section of any word in the Dragon vocabulary, you can adjust the spacing inputted before and after the word. For example, the word "dot" has no spaces before or after it whereas the "period" has no spaces before it but one (or two) spaces after it. similarly "open paren" has one space before and zero spaces after. these all work fine with this text action approach but not so with my custom Dragon vocab words; "slash" seems to be an exception – it does not work properly. Printed forms are also ignored by the text action (at least for custom words) .

A third issue is that the very useful command "scratch that" and commands built on top of it such as "make that " don't work with this text action approach. Perhaps we could create our own version of "scratch that".

daanzu commented 5 years ago

Setting priorities for grammars/rules could be possible for Kaldi & WSR. It wouldn't be hard for Kaldi; for the WSR backend, however, it would require a large reworking. I don't know about natlink.

quintijn commented 5 years ago

I think setting priorities of command grammars in Dragon/Natlink is not possible. I am also not in favour of it. I think duplicate recognitions should be avoided.

quintijn commented 5 years ago

The spacing/capitalisation issue, as normally done by Dragon in "supported windows" is simulated in the natlink module nsformat.py. This behaviour is also used for utterances in the same "unsupported" window.

The trick is, that the state of the previous utterance is kept for the next. User grammars can use this same mechanism. https://qh.antenna.nl/unimacro/installation/technicaldetails/nsformat.html

alexboche commented 5 years ago

(Edited in response to daanzu's helpful suggestion.) @quintijn Thank you for this! The file nsformat.py to which Quintijn is referring is here. @daanzu it might be of interest to you if you plan to develop spacing/capitalization and other formatting techniques for dictation in Kaldi. Quintijn can you guide me on how to use this within daanzu's code above? (I know you may not be so familiar with dragonfly, but I bet you can figure out what is going on just based on your knowledge of nsformat.py) I have attempted this as indicated below without success. My errors are probably very simple since I'm not well versed in this. One problem is that I'm not familiar with how to manage state within a Python program so I'm not sure I'm properly dealing with the state here. But the more immediate problem is that even if I just passed in the state as None, when I try to execute the text action Text() via a function format_dictation, it is typing out the text with spaces in between virtually every letter, three spaces between words, and no space between utterances. Why is this? I was expecting that the value formatted_output returned by nsformat.formatWords() would be the properly formatted dictation string. I call the function format_dictation below by using the dragonfly Function action like this:

"<dictation>": Function(format_dictation, input_state=formatting_state)

import nsformat
formatting_state = None
def format_dictation(dictation, input_state):
        formatted_output, output_state = nsformat.formatWords(str(dictation), state=input_state)
        formatted_output = str(formatted_output) # not sure if this str is helpful?
        Text(formatted_output).execute()
        global formatting_state 
        formatting_state = output_state

Full code is here

LexiconCode commented 5 years ago

I'm not discouraging anyone from working with grammar prioritization weights. As an alternative to modifying priorities for grammars which may not be compatible with all current or future Backends consider the following.

Included as a command. This is great for short dictation. "say <dictation>": Text("%(dictation)s "),

A dictation mode which can be enabled and disabled as a ccr.(We can toggle off other grammars selectively). Great for longform dictations. "<dictation>": Text("%(dictation)s "),

Based on the running Backend we can enable enhanced functionality such as dictation based on grammar prioritization as supported.

daanzu commented 5 years ago

@quintijn @alexboche Thanks for the info about natlink formatting and the gist with the code.

Regarding setting priorities for grammars/rules, it would affect not only conflicting/ambiguous rules with duplicate specs, but also ones with differing specs. In the latter case, it would allow you to combat instances where certain phrases are recognized more often than they should be, but you don't want to change their spec or disable them entirely. It simply makes the engine less likely to "take that path" during recognition.

daanzu commented 5 years ago

@alexboche I think you may want Text(formatted_output).execute().

quintijn commented 5 years ago

@alexboche May I assume you got the nsformat thing work?

In the voicecode project, yes, latest update from 2006, grammars were completely based on catching dictate. If users want to really delve into this, it should be worth the trouble in reviving this code.

Also extensive select and say grammars were designed, I worked on it at the time. Results were spectacular!

But... quite a challenge to dive into this, but it is still all there!! Implementation for emacs.

Go to sourceforge.net, search for voicecode and download the latest release.

alexboche commented 5 years ago

@quintijn No I did not get the nsformat thing to work. Nsformat is returning the text with spaces in between virtually every letter, three spaces between words, and no space between utterances. I was expecting that the value returned by nsformat.formatWords() would be the properly formatted dictation string. For more details of the code I tried, please see my post above linked here

I may explore the voice code thing (thank you for the tip), but if we can get this working without that would probably be easier.

quintijn commented 5 years ago

@alexboche do you pass in the LIST of words that were recognized by the rule? ["a", "list", "of", "words"] instead of "a list of words"

alexboche commented 5 years ago

Quintijn is correct thank you. nsformat.formatWords() must be given a list of words not a string of the words joined together. The list of words can be accessed by using words the attribute of the dragonfly Dictation object. So the working code looks like this:

from dragonfly import *
import nsformat

dictation_length = 0 # for "scratch that" purposes
input_state = None
# Note:  if you pass in the input state into the function using the function action rather than just accessing it
# from within the function using the fact that it is in global scope,  the input state will not be updated 
#  when you run the dictation command multiple times
def format_dictation(dictation):
    global dictation_length, input_state    
    print("input_state: ", input_state)
    formatted_output, output_state = nsformat.formatWords(dictation.words, state=input_state)
    print("output_state: ", output_state)
    formatted_output = str(formatted_output)
    Text(formatted_output).execute()
    dictation_length = len(formatted_output)
    input_state = output_state

def scratch_that():
    # Key("backspace:%(dictation_length)s").execute() # this isn't substituting properly
    for i in range(dictation_length):
        Key("backspace").execute()

class CommandRule(MappingRule):
    mapping = {"strike":  Function(scratch_that), "ross": Key("right")}
command_rule = CommandRule()
class DictationRule(MappingRule):
    mapping = {"<dictation>": Function(format_dictation),}
    extras = [ Dictation("dictation") ]

Full working code here:

alexboche commented 5 years ago

Further discussion of formatting and potentially the entire topic of this issue will be henceforth discussed in the dragonfly issue here

dictation-toolbox / Caster

Transitioning continuously between dictation and commands #623