dictation-toolbox / dragonfly

Speech recognition framework allowing powerful Python-based scripting and extension of Dragon NaturallySpeaking (DNS), Windows Speech Recognition (WSR), Kaldi and CMU Pocket Sphinx
GNU Lesser General Public License v3.0
375 stars 73 forks source link

Autoformatting with dictation #110

Open alexboche opened 4 years ago

alexboche commented 4 years ago

( Editing this to make it more concise.) Dragon does autoformatting with spacing and capitalization. for example,after you say a period, it will capitalize the next word. Regarding spacing, typically with dictation it is good to have a space inserted at the beginning of each utterance unless E.g. You just switched Windows or the last two characters were \n\n or something (possibly just a single \n).

Dragonfly already does something like this in dictation_format.py but it only formats dictation based on the other words said in the same utterance. It does not use information about what was said in the previous utterance (according to my testing). So e.g. if you say "period" in the middle of an utterance, it will capitalize the next word but if you end an utterance with a period and then say a new utterance it will not capitalize the next word after the period.

I think maybe such cross-utterance state-keeping behavior should be included inside dragonfly. That said, it might not be appropriate for when the user is just dictating a few phrases occasionally with mid utterance dictation commands e.g. "say <dictation>" rather than working with a dictation grammar where all dictation is filtered through a text action. Perhaps Dictation could have a special parameter for whether to output :

  1. the formatted text using the state of the previous utterance versus
  2. the text formatted only using the state of the current utterance (as is currently the behavior of dragonfly) versus
  3. the raw text not formatted at all. Something simple like this has been implemented in Talon here (see the class AutoFormat) and in Natlink here (see the function formatWords on line 117 though I haven't looked closely) (edit: using nsformat.py in dragonfly is implemented in the post below), I also made a little toy example from scratch for capitalization after periods in my own dragonfly grammar here

If this is not to be implemented in dragonfly itself, I would want to implement it in my own grammar. For doing that it might be useful for my commands to have access to the raw dictation list outputted by Dragon i.e. the list of words before it gets formatted in dictation_format.py and before it gets joined into a string. How do I access that?

I have been experimenting with CCR between dictation and commands (with @daanzu 's help) basically using Repetition(Alternative(RuleRef(dictation_rule), RuleRef(command_rule))) Where the dictation_rule filters dictation through a text action "<dictation>": Text(dictation). (full working grammar here Based on what daanzu gave me) I would like the Dragon autoformatting to work when I'm filtering through the text action.

Also a way to use the "scratch that" command would be cool since that currently doesn't work when you filter dictation through a text action (edit: implemented below).

drmfinlay commented 4 years ago

Thanks for this. I think adding an auto-formatting parameter for Dictation elements could work. There is an autofmt parameter for the Text action already (see action_text.py), but I'm not sure how well it works. There are definitely some use cases for cross-utterance state-keeping behaviour. I agree that it should not be active by default.

It looks like nsformat could be used in dragonfly via import nsformat if that module does allow for this sort of behaviour.

It should be noted that an additional parameter like this wouldn't really work with the recent formatting additions to the Dictation and DictationContainer classes. @mrob95 I would be interested in your thoughts on this issue.

You can access the raw dictation list by using the words property on DictationContainer objects:


def process_text(text):
    print(text.words)

mapping={"<text>": Function(process_text)}
extras = [Dictation("text")]
rule = MappingRule(mapping=mapping, extras=extras)

I could be wrong, but I don't think you can catch dictation like this and still have commands like "scratch that" work correctly.

quintijn commented 4 years ago

@alexboche If issues arise when using nsformat, please notify. It is a function that is from quite a while ago, and not very intensively used, I think.

alexboche commented 4 years ago

I got the nsformat.py thing to work with a tip from @Quintijn. For the scratch that thing, we can just store the length of the last dictation string printed out, and then press backspace that many times (using control backspace or control shift left repeated followed by backspace would be faster though not as reliable). See the code below for a working example. (full code here).

The behavior of as format can be easily adjusted. For example, nsformat.py puts double spaces after periods ( as well as ? and !). To change this to single space, simply drop the double space on line 56. I.e. change it from flags_like_period = (9, 4, 21, 17) to flags_like_period = (4, 21, 17) .

Note that it appears to be essential to not pass in the nsformat state variable (and similarly with dictation_length) via the function action because if you do that it does not properly get updated after each utterance. Rather it seems to be necessary to have the value of the state variable be accessed via its global status in the file. (Though I think actually declaring the variable as global is only necessary for assigning within the function, which in any case I am doing). If you don't understand what I mean, look at how the variable formatting_state is managed in the following link and note that the state is not properly updated after each utterance (in contrast to the code below).

The state keeping behavior can sometimes put a space when you don't want it. For example, consider the following sequence of utterances (where | represents where the cursor is after the utterance): 1) dictate hello world| ; 2) say a command to move the cursor left five ( so that you then have hello |world); 3) dictate the word "there". The result will be hello world where there are two spaces between hello and world --one space too many. One might think that they could avoid that problem by resetting the nsformat state variable to a "no-space" state (e.g. -1) every time other commands besides the dictation command is used. But that would cause the reverse problem when trying to dictate ( e.g. the word "there") from the position hello| world: the result would be hellothere world.

As I see it, the only foolproof spacing solution would be to use information about the cursor position and the text on the page. This can be done through accessibility APIs and I think that is what Dragon itself does. (In principle, the clipboard could be used but I don't think that is worth pursuing.)

In the meantime, it is much better to have an extra space than to not have a space when one should have one. To avoid the extra space, one can prefix words by "no-space" or just use a command like "say <dictation>". If the change proposed in pull request #111 is incorporated, the dictation prefix "no-space" can be shortened by making a custom property (perhaps called nace) in the dictionary property_map (in dictation_format.py) with the formatting flag "no_space_after" as follows "nace": WordFlags("no_cap_reset", "no_space_after"),. "nace" would also have to be added to the user's Dragon vocabulary to make that work.

from dragonfly import *
import nsformat

dictation_lengths = [] # for "scratch that" purposes
input_state = None

def format_dictation(dictation):
    global dictation_lengths, input_state    
    print("input_state: ", input_state)
    formatted_output, output_state = nsformat.formatWords(dictation.words, state=input_state)
    print("output_state: ", output_state)
    formatted_output = str(formatted_output)
    Text(formatted_output).execute()
    #  for "scratch that" purposes
    dictation_lengths.append(len(formatted_output))
    input_state = output_state

def scratch_that():
    try:
        for i in range(dictation_lengths[-1]):
            Key("backspace").execute()
        dictation_lengths.pop()
    except IndexError:
        print("to use the command 'strike' you must dictate first")        

class CommandRule(MappingRule):
    mapping = {"strike":  Function(scratch_that) * Repeat(extra='n'), "ross": Key("right")}
command_rule = CommandRule()
class DictationRule(MappingRule):
    mapping = {"<dictation>": Function(format_dictation),}

Resetting the state one switching windows is something that might be useful and which nsformat does not do.

Another issue is that the text action is a little bit slow when printing out long dictation utterances. The text can be put on the clipboard and pasted to increase print speed though it would be better to have the Text action just become faster somehow. Relatedly, the keypress speed is a bit too slow sometimes including one using backspace for the "scratch that" -style command shown above . Edit: using a mimic or playback action might be a way to increase the speed of the text output. I have not yet had time to look into how the formatting would work with that (maybe Dragon would take care of it automatically).

Lastly, the grammar with a command like <dictation>: Text("(dictation)s") should be de- prioritized ( just like how Dragon's native dictation grammar is de-prioritized relative to command grammars.) I know this is made possible using Draconity ( possibly Natlink can also do this).

drmfinlay commented 4 years ago

Thanks again for looking into this. It's good to see there are different ways to implement "scratch that", although I was thinking about it and realised you could probably use "scratch that" and other commands normally by deactivating the <dictation> rule during mimic() calls rather than reinventing the wheel! ;-)

As to how you would do that, a FuncContext could be used as the context for the dictation rule (not the grammar).

Your approach could still be useful for every other engine backend though.

wolfmanstout commented 4 years ago

I would vote to keep this kind of autoformatting functionality (as an optional feature) in Text, not Dictation. Dictation is just capturing what you are saying, where is this is a property of the action behavior. I do think that the accessibility API functionality I added would provide a clean way to implement this, because that way it integrates smoothly with text changes coming from outside Dragonfly (e.g. physical keypresses). This is currently a little awkward because the accessibility API depends on the Text action. Here are my initial thoughts on the cleanest way to integrate this: 1) Modify the Accessibility API to depend directly on a lower level layer than Text. 2) Add this functionality as a new method in the Accessibility API (exposed in the controller, like everything else). 3) Expose this functionality to the user as an option in Text.

I think this is nice because it provides a layered stack: device control at the lowest layer, accessibility API atop that, and actions atop that.

The quick/hacky way to get something up and running would be to target (2) first using Text within the accessibility controller, like I currently do.

I'm working on another project right now so I'm not planning on doing this anytime soon, but I'd be supportive if someone else wants to take it on.

drmfinlay commented 4 years ago

Thanks for your thoughts on this @wolfmanstout and sorry for the (very) late response! Implementing optional autoformatting functionality like this would be nice, although I think it would get pretty complicated even with the abstraction you mentioned.

I wanted to touch on the points in your previous posts @alexboche. Having moved to mostly using the Kaldi Dragonfly engine backend, I figured I needed a way to format dictation output. I discovered that the WordFormatter class inside dictation_format.py can already save dictation state between utterances, the Natlink dictation container class just uses a new instance every time. I think the current behaviour is correct though, as you normally wouldn't want your previous utterances to effect dictation output for tasks like typing a variable name.

Anyway, I have adapted the Natlink formatting classes to work with the other engines and created a context-aware grammar that keeps track of formatting flags for each window I dictate into, going a step further than the nsformat grammar you posted above. The grammar should work with any engine really. I'm planning to upload it and the modified dictation_format.py file somewhere soon. I would like to include the dictation_format.py changes somewhere in Dragonfly.

daanzu commented 4 years ago

@Danesprite This sounds great! I am looking forward to some nice formatting. I threw together a hacky version of what you are describing, but haven't gotten around to cleaning it up to be generally usable. I would love to try yours out.

drmfinlay commented 4 years ago

@daanzu Yeah, I'm finding it really useful! I'll try to upload it somewhere today so you can try it out :)

drmfinlay commented 4 years ago

@daanzu Here is a Gist with the command module and the text formatting code: https://gist.github.com/Danesprite/413895d62a4a699f14a48796f9fda7e7

Place both files with your other command modules and you should be good to go. It is an exclusive grammar with enable/disable commands. Say "enable dictation" to turn it on and "disable dictation" to turn it off. The grammar keeps track of the active state via a .dictation-grammar-status.txt text file containing 0 or 1, so the dictation mode will persist between restarts. This also lets you toggle the state externally.

I've tested it with WSR on Windows and Kaldi on Linux. It works pretty well with both. Because I've re-used the Natlink formatting code, it is tuned to things you would normally say when using DNS. For example:

Utterances:
- "hello world full stop new line"
- "testing"
- "this is the same sentence"
- "new paragraph"
- "this is a new sentence and paragraph period"

Output:
Hello world.
Testing this is the same sentence

This is a new sentence and paragraph.

As you can see, there are special phrases that get translated into characters, such as "full stop". I'm hoping to clean up the text_dictation_formatting.py file and include it under dragonfly/engines/base somewhere. I am concerned about how to make this configurable for other spoken languages though.

Anyway, hope you find this useful! :)

daanzu commented 4 years ago

@Danesprite Thanks, I'll try it out!

drmfinlay commented 4 years ago

@alexboche It would be trivial to add an optional parameter to Dictation that allows for text using the state of the previous utterance versus the text formatted only using the state of the current utterance (current behaviour). Perhaps persistent_formatting=False?

If set to False, the format parameter of the dictation class covers your third option:

the raw text not formatted at all.

I only implemented that parameter recently. It is documented here. Although, I must point out that the formatting output received by the nsformat.py and dictation_format.py files will be somewhat unusable without some text processing.

drmfinlay commented 1 month ago

I had forgotten about this issue. I still agree with James Stout, it should be optional, implemented in grammars. I'm hoping to add a dictation_format module for the other engines, used optionally. Probably it will only support English.

I'll add a question and answer to Dragonfly's FAQ on solutions to (inter-utterance) dictation formatting, using Dragon or another engine. My dictation mode grammar should be enough to get the interested user started.