Closed alexboche closed 4 years ago
That sounds useful. However things are about to change with how we do CCRMerging.
LexiconCode mentioned that it is possible to make separate CCR grammars that are not merged with each other. So there could be a separate CCR grammar that has the dictation command just mentioned and a small number of other commands for moving the cursor and so on.
To clarify, So this would be a CCR grammar that's not merged into the global grammar? Are you thinking about an app specific CCR?
I think what @alexboche is talking about is, essentially, being able to mix dictation and CCR commands. AFAIK, this is strictly not possible with Dragon, at least not with the way that Dragonfly grammars currently work. The only way to do it would be if Caster were to not use Dragon's command mode at all, capture ALL dictation output from Dragon, and then parse it to try to figure out which words are commands and which are dictation. This is how Nils Klarlund's ShortTalk worked. It can work, but comes with a lot of challenges, both technical and behavioral (such as your command words need to be completely foreign, not just obscure, like Ross, lease, sauce, dunce, etc). As much as I would love to be able to mix commands in with dictation, I feel like the amount of work it would require just isn't worth it.
To clarify, So this would be a CCR grammar that's not merged into the global grammar? Are you thinking about an app specific CCR?
I was thinking a separate CCR grammar that is not merged into the main global CCR grammar.the separate CCR grammar could be global or context specific (definitely something that could be enabled and disabled). This separate ccr grammar would just have a small number of commands in it, ideally with words that are not in the Dragon vocabulary. the command would basically be command for text navigation, e.g. arrow keys (with modifiers), maybe the clipboard based text navigation commands. it might be possible to use a mimic command to get the native Dragon text manipulation commands to work CCR with respect to words preceding them (not words following them).
Ryan seems to have made some headway on this in Talon,but seems inconsistent so far. not sure how his system works; it may work at a somewhat lower level than just forcing dictation through a command. They also has the ability to force words to be dictation by saying "word
Might not be worth the effort. Just something to think about. If you feel like this is a no go, feel free to close the issue.
Alternatively, it might be good to make some more commands like caster's "format
@alexboche Thanks for clarifying the scenario with CCR. @chilimangoes Interesting I wonder if @quintijn would have some knowledge with Natlink that could be leveraged to make this happen in DNS without postprocessing free dictation?
@alexboche
One way to do this is to filter all commands through a ccr command using a text action like
"<dictation>": Text("%(dictation)s")
Could you include a demo grammar with implementation with a few other commands mixed in?
I may be misunderstanding, but isn't this what the brunt
keyword is for in:
"(<capitalization> <spacing> | <capitalization> | <spacing>) [(bow|bowel)] <textnv> [brunt]":
R(Function(textformat.master_format_text)),
Ending the dictation and transitioning back to commands, e.g.
laws bow this is a test brunt queue four deli
@LexiconCode I seem to remember reading somewhere that this is a fundamental limitation of the functionality that is exposed to Natlink by Dragon, although I could be wrong.
@mrob95 the way I understood the proposal was, he would like to be able to say something like "Mary had a little lamb comma who's fleece was blue queue lease white as snow" as a single utterance, and without the initial "format" keyword that makes Dragon recognize it as a command, and have the following output:
Mary had a little lamb comma who's fleece was blue
Followed by selecting the word blue and replacing it with "white as snow" to get the following:
Mary had a little lamb comma who's fleece was white as snow
After thinking about this overnight, I might be warming to the, if it's not too buggy. Dragon would essentially be in permanent command mode at that point, since you would have a catch-all
However, in order to more intelligently handle spacing and capitalization when doing standard dictation, we would want to use the new accessibility based text manipulation in Dragonfly rather than the blind output Caster currently does. Although, TBH, I think this last bit is the right direction anyway.
That is the future and will drastically simplify the process of existing manipulating text. This pull request gives people the chance to utilize it with caster. However I closed because it wasn't simple way to integrate dependencies.
I was thinking a separate CCR grammar that is not merged into the main global CCR grammar.
@alexboche I have decoupled the new CCRMerger from the rules it produces. So shortly, there will be nothing to stop you from making another CCRMerger instance and making more "global" type rules.
So here's a (Edit: new an simpler) sample grammar provided by David Zurow. This will work if you just put it right into a new file in caster (or anywhere in macro system I think). This would probably not be merged with other grammars.
from dragonfly import *
class CommandRule(MappingRule):
mapping = {
"splat [<n>]":Key("c-backspace:%(n)s"),
"fly lease [<n>]": Key("c-left:%(n)s"),
"fly ross [<n>]": Key("c-right:%(n)s"),
"lease [<n>]": Key("left:%(n)s"),
"ross [<n>]": Key("right:%(n)s"),
}
extras = [IntegerRef("n", 1, 10)]
defaults = {"n":1}
command_rule = CommandRule()
class DictationRule(MappingRule):
mapping = {
"<dictation>": Text("%(dictation)s "), # adding a trailing space
}
extras = [ Dictation("dictation") ]
dictation_rule = DictationRule()
dict_cmd_sequence = Repetition(Alternative([RuleRef(dictation_rule), RuleRef(command_rule)]),
min=1, max=10, name="dict_cmd_sequence")
class SequenceRule(CompoundRule):
spec = "<dict_cmd_sequence>"
extras = [dict_cmd_sequence]
def _process_recognition(self, node, extras):
for action in extras["dict_cmd_sequence"]:
action.execute()
grammar = Grammar("zurow")
sequence_rule = SequenceRule()
grammar.add_rule(sequence_rule)
grammar.load()
I'm going to be testing this out in the wild of it to see how it goes.
A first issue with this is that when this grammar is active, all dictation is passed-through it which I think means that it does not have lower priority then other commands (unless we tweak the priorities ourselves). This contrast the usual situation in Dragon where commands are prioritized over dictation. We should probably figure out how to manually adjust the priorities of different grammars because e.g. we might want to turn down the priority of this grammar so that commands do not instead get accidentally interpreted as dictation, but this doesn't seem to be much of a problem so far (will have to monitor it) .
A second issue is spacing (and perhaps other formatting) . With this approach, because dragon's dictation is passing through text action, some of the formatting in the Dragon vocabulary seems to sometimes but not always get lost in translation. I don't quite understand how this works yet. here are two ways that this spacing issue manifests. A) At the beginning of every utterance, Dragon seems to somehow leave an invisible space at the end of the last word of each dictation utterance or something but that does not work with this text action approach which is why I added the space at the end of the text action. I don't quite understand how Dragon does this but I I did a little fiddling and noticed that although when you dictate a phrase and then pause and then dictate another phrase it will put the proper space in between the two phrases, if after the first phrase you click on a different part of the text and then click back to the end of the first phrase before dictating the second phrase, no space will be placed in between the phrases. Having the space at the end of the tax action should mostly mitigate this issue, though eventually a better solution would be good. B) In the properties section of any word in the Dragon vocabulary, you can adjust the spacing inputted before and after the word. For example, the word "dot" has no spaces before or after it whereas the "period" has no spaces before it but one (or two) spaces after it. similarly "open paren" has one space before and zero spaces after. these all work fine with this text action approach but not so with my custom Dragon vocab words; "slash" seems to be an exception – it does not work properly. Printed forms are also ignored by the text action (at least for custom words) .
A third issue is that the very useful command "scratch that" and commands built on top of it such as "make that
Setting priorities for grammars/rules could be possible for Kaldi & WSR. It wouldn't be hard for Kaldi; for the WSR backend, however, it would require a large reworking. I don't know about natlink.
I think setting priorities of command grammars in Dragon/Natlink is not possible. I am also not in favour of it. I think duplicate recognitions should be avoided.
The spacing/capitalisation issue, as normally done by Dragon in "supported windows" is simulated in the natlink module nsformat.py. This behaviour is also used for utterances in the same "unsupported" window.
The trick is, that the state of the previous utterance is kept for the next. User grammars can use this same mechanism. https://qh.antenna.nl/unimacro/installation/technicaldetails/nsformat.html
(Edited in response to daanzu's helpful suggestion.)
@quintijn Thank you for this! The file nsformat.py to which Quintijn is referring is here. @daanzu it might be of interest to you if you plan to develop spacing/capitalization and other formatting techniques for dictation in Kaldi.
Quintijn can you guide me on how to use this within daanzu's code above? (I know you may not be so familiar with dragonfly, but I bet you can figure out what is going on just based on your knowledge of nsformat.py)
I have attempted this as indicated below without success. My errors are probably very simple since I'm not well versed in this. One problem is that I'm not familiar with how to manage state within a Python program so I'm not sure I'm properly dealing with the state here. But the more immediate problem is that even if I just passed in the state as None
, when I try to execute the text action Text()
via a function format_dictation
, it is typing out the text with spaces in between virtually every letter, three spaces between words, and no space between utterances. Why is this? I was expecting that the value formatted_output
returned by nsformat.formatWords() would be the properly formatted dictation string. I call the function format_dictation
below by using the dragonfly Function action like this:
"<dictation>": Function(format_dictation, input_state=formatting_state)
import nsformat
formatting_state = None
def format_dictation(dictation, input_state):
formatted_output, output_state = nsformat.formatWords(str(dictation), state=input_state)
formatted_output = str(formatted_output) # not sure if this str is helpful?
Text(formatted_output).execute()
global formatting_state
formatting_state = output_state
Full code is here
I'm not discouraging anyone from working with grammar prioritization weights. As an alternative to modifying priorities for grammars which may not be compatible with all current or future Backends consider the following.
Included as a command. This is great for short dictation.
"say <dictation>": Text("%(dictation)s "),
A dictation mode which can be enabled and disabled as a ccr.(We can toggle off other grammars selectively). Great for longform dictations.
"<dictation>": Text("%(dictation)s "),
Based on the running Backend we can enable enhanced functionality such as dictation based on grammar prioritization as supported.
@quintijn @alexboche Thanks for the info about natlink formatting and the gist with the code.
Regarding setting priorities for grammars/rules, it would affect not only conflicting/ambiguous rules with duplicate specs, but also ones with differing specs. In the latter case, it would allow you to combat instances where certain phrases are recognized more often than they should be, but you don't want to change their spec or disable them entirely. It simply makes the engine less likely to "take that path" during recognition.
@alexboche I think you may want Text(formatted_output).execute()
.
@alexboche May I assume you got the nsformat thing work?
In the voicecode project, yes, latest update from 2006, grammars were completely based on catching dictate. If users want to really delve into this, it should be worth the trouble in reviving this code.
Also extensive select and say grammars were designed, I worked on it at the time. Results were spectacular!
But... quite a challenge to dive into this, but it is still all there!! Implementation for emacs.
Go to sourceforge.net, search for voicecode and download the latest release.
@quintijn No I did not get the nsformat thing to work. Nsformat is returning the text with spaces in between virtually every letter, three spaces between words, and no space between utterances. I was expecting that the value returned by nsformat.formatWords() would be the properly formatted dictation string. For more details of the code I tried, please see my post above linked here
I may explore the voice code thing (thank you for the tip), but if we can get this working without that would probably be easier.
@alexboche do you pass in the LIST of words that were recognized by the rule? ["a", "list", "of", "words"] instead of "a list of words"
Quintijn is correct thank you. nsformat.formatWords() must be given a list of words not a string of the words joined together. The list of words can be accessed by using words
the attribute of the dragonfly Dictation object. So the working code looks like this:
from dragonfly import *
import nsformat
dictation_length = 0 # for "scratch that" purposes
input_state = None
# Note: if you pass in the input state into the function using the function action rather than just accessing it
# from within the function using the fact that it is in global scope, the input state will not be updated
# when you run the dictation command multiple times
def format_dictation(dictation):
global dictation_length, input_state
print("input_state: ", input_state)
formatted_output, output_state = nsformat.formatWords(dictation.words, state=input_state)
print("output_state: ", output_state)
formatted_output = str(formatted_output)
Text(formatted_output).execute()
dictation_length = len(formatted_output)
input_state = output_state
def scratch_that():
# Key("backspace:%(dictation_length)s").execute() # this isn't substituting properly
for i in range(dictation_length):
Key("backspace").execute()
class CommandRule(MappingRule):
mapping = {"strike": Function(scratch_that), "ross": Key("right")}
command_rule = CommandRule()
class DictationRule(MappingRule):
mapping = {"<dictation>": Function(format_dictation),}
extras = [ Dictation("dictation") ]
Full working code here:
Even though you can transition from commands to dictation by using a command like
"format <dictation>"
, there is currently no way to transition from dictation to commands without pausing. It would be nice if you didn't have to pause. Allowing too many commands to be active while dictating would cause misfires, but it would be good to have the ability to say a few commands that are frequently used while dictating. an example of such a command would be moving to the left/right n words. Such commands would probably need to have names that are not similar to any commonly used words.One way to do this is to filter all commands through a ccr command using a text action like
"<dictation>": Text("%(dictation)s")
. (there is probably a better way, but I have tested this briefly and it works.) LexiconCode mentioned that it is possible to make separate CCR grammars that are not merged with each other. So there could be a separate CCR grammar that has the dictation command just mentioned and a small number of other commands for moving the cursor and so on. (I have not attempted to test this part.)