dictation-toolbox / Caster

Dragonfly-Based Voice Programming and Accessibility Toolkit
Other
339 stars 121 forks source link

add a kaldi rule (at least for sleep/wake) #797

Open kendonB opened 4 years ago

kendonB commented 4 years ago

Is your feature request related to a problem? Please describe. Dragon has a bunch of built in commands that help with use. It would be nice for switchers to have a Kaldi rule loosely based on the functionality available in base Dragon.

Describe the solution you'd like A Kaldi grammar with the following features:

To add sleep/wake we need to translate the grammar in this file into caster.

@daanzu if you have any pointers or know of someone who has done this please let us know.

LexiconCode commented 4 years ago

This potential isn't too hard to implement. This would be relevant to all engines not just Kaldi. How Kaldi manages to make this work set_exclusiveness(). An exclusive grammar takes precedent over all other active grammars. A Rule/grammars that are exclusive are only available for recognition.

Note setting a grammar to exclusive overrides DNS's built-in sleep-wake function. When using DNS "natlink.setMicState("sleeping")" setting to match the state of the grammar.

lexxish commented 4 years ago

This potential isn't too hard to implement. This would be relevant to all engines not just Kaldi. How Kaldi manages to make this work set_exclusiveness(). An exclusive grammar takes precedent over all other active grammars. A Rule/grammars that are exclusive are only available for recognition.

Note setting a grammar to exclusive overrides DNS's built-in sleep-wake function. When using DNS "natlink.setMicState("sleeping")" setting to match the state of the grammar.

The FunkContext with a mapping rule.

Overriding the DNS default commands for sleep/wake would be nice because you can use shorter commands such as "snore" to put the mic to sleep. Only downside is I'm not sure if we could get the taskbar icon toggling from green to blue. I think the Kaldi implementation is more important regardless.

LexiconCode commented 4 years ago

Only downside is I'm not sure if we could get the taskbar icon toggling from green to blue.

Fortunately I believe this can be handled by natlink.setMicState(state) reading the documentation it controls the mic, where state is 'on', 'off' or 'sleeping' natlink.getMicState() returns current state. Therefore the DNS icon could be in sync with the exclusive grammar state.

I agree the though the implementation is more important for WSR/Kaldi.

lexxish commented 4 years ago

I agree the though the implementation is more important for WSR/Kaldi.

Regarding Kaldi, would implementation involve changing content_loader.py or does this rule operate independent of how you load the other rules?

LexiconCode commented 4 years ago

Fortunately we don't have to change anything in Caster to make grammars exclusive. It's a simple bool. It works on any other rule. The rule must be already loaded into the engine before it's set to be exclusive. One set no other commands except for those that are exclusive through one or multiple rules will be recognized.

kendonB commented 4 years ago

@lexxish did you ever figure out getting sleep to work?

LexiconCode commented 4 years ago

@lexxish

With straight dragonfly this would be pretty easy with Caster it's a bit different because we don't know the grammar name being used as it's different every boot. I've been working on programmatically switching DNS Modes in preparation for creating a mode unified mode manager for all engines. The following could be used in the sleep grammar.

from dragonfly import get_engine, Grammar

def find_grammar_name():
    grammar_cache = None
    if grammar_cache is None:
        for grammar in get_engine().grammars:
            for rule in grammar.rules:
                if rule.exported:
                    if rule.name == "Mode Rules": # Rule name
                        print(rule.name)
                        grammar_cache = grammar
                        return grammar_cache
    else:               
        return grammar_cache

in another function then you can use grammar_cache.set_exclusiveness(0) or grammar_cache.set_exclusiveness(1) to toggle exclusiveness

You can also check for the running engine type if there is differences that need to be handled based on engine implementation. For example with DNS:

if get_engine()._name == 'natlink':
    import natlink 
    # Do something
LexiconCode commented 4 years ago
* A "maximize window" command that works correctly.

What's wrong with the current behavior @kendonB?

An automatic program switcher. In Dragon. you say "switch to ".

Besides creating a GUI the backend information could be obtained from a tweaked function to use get_all_windows(): returning all pids list instead of Window.get_foreground()

def get_active_window_info():
    '''Returns foreground window executable_file, executable_path, title, handle, classname'''
    FILENAME_PATTERN = re.compile(r"[/\\]([\w_ ]+\.[\w]+)")
    window = Window.get_foreground()
    executable_path = str(Path(get_active_window_path()))
    match_object = FILENAME_PATTERN.findall(window.executable)
    executable_file = None
    if len(match_object) > 0:
        executable_file = match_object[0]
    return [executable_file, executable_path, window.title, window.handle, window.classname]
lexxish commented 4 years ago

@lexxish did you ever figure out getting sleep to work?

I have not tried yet. Will update you all if I do.

I do have some "switch to" like code I can post if anyone wants it. I use a phonetic distance library to choose the best match based on what is currently running. Also have "open" like command that searches a couple directories (e.g. desktop)...it's not perfect and I think the way "bring" allows you to specify programs is also nice for things you use a lot.

Another item that would be nice would be ability to use Kaldi for commands, but DNS for dictation - similar to how I believe Kaldi can be used with Google Speech Recognition.

Last item that would be nice to have (but deserves it's own issue number) is integration with accessibility APIs like DNS has. So you can say things like "Click X" when X is a button in a browser.

lexxish commented 4 years ago
* A "maximize window" command that works correctly.

What's wrong with the current behavior @kendonB?

An automatic program switcher. In Dragon. you say "switch to ".

Besides creating a GUI the backend information could be obtained from a tweaked function to use get_all_windows(): returning all pids list instead of Window.get_foreground()

def get_active_window_info():
    '''Returns foreground window executable_file, executable_path, title, handle, classname'''
    FILENAME_PATTERN = re.compile(r"[/\\]([\w_ ]+\.[\w]+)")
    window = Window.get_foreground()
    executable_path = str(Path(get_active_window_path()))
    match_object = FILENAME_PATTERN.findall(window.executable)
    executable_file = None
    if len(match_object) > 0:
        executable_file = match_object[0]
    return [executable_file, executable_path, window.title, window.handle, window.classname]

I could be wrong, but I think Caster's default maximize uses "alt+SPACE, x" to maximize rather then sending the foreground window a maximize message (https://docs.microsoft.com/en-us/windows/win32/learnwin32/window-messages). I don't think "alt+SPACE, x" works for every application, but can't think of a specific one right now. The same type of scenario for exists for closing windows in Caster too I believe, where we could send SIGTERM and/or SIGKILL message equivalents (probably two different voice commands) instead of using keyboard shortcuts and it would (hopefully) work more consistently.

LexiconCode commented 4 years ago

I could be wrong, but I think Caster's default maximize uses "alt+SPACE, x" to maximize rather then sending the foreground window a maximize message

Back when implementing kaldi support I switched it from "alt+SPACE, x" to dragonfly cross-platform implementation. For Windows OS utilizes Win32. If something's not behaving correctly with those minimize/maximize commands let me know.

https://github.com/dictation-toolbox/Caster/blob/7d3834eed076d39db1f163582d4e457ab71ee5f4/castervoice/rules/core/navigation_rules/window_mgmt_rule.py#L13

https://github.com/dictation-toolbox/Caster/blob/7d3834eed076d39db1f163582d4e457ab71ee5f4/castervoice/lib/utilities.py#L77

LexiconCode commented 4 years ago

Last item that would be nice to have (but deserves it's own issue number) is integration with accessibility APIs like DNS has. So you can say things like "Click X" when X is a button in a browser.

I will open up a new issue. Done https://github.com/dictation-toolbox/Caster/issues/814

daanzu commented 4 years ago

Another item that would be nice would be ability to use Kaldi for commands, but DNS for dictation - similar to how I believe Kaldi can be used with Google Speech Recognition.

I don't have experience with Natlink, and don't currently have Dragon installed, but I'd be happy to help implementing this. Is there a way with Natlink to just get straight dictation recognition text from audio data passed to it? https://github.com/daanzu/kaldi-active-grammar/issues/23

LexiconCode commented 4 years ago

Perhaps there should be an issue in KaldiAG for working on this?

Agreed

LexiconCode commented 4 years ago

@lexxish and @kendonB I will attempt to implement the sleeping grammar and modes for all engines. These modes will override DNS's built-in modes but will be kept in sync with the DNS GUI.

LexiconCode commented 3 years ago

The https://github.com/dictation-toolbox/Caster/pull/881 addresses the following request.

An automatic program switcher. In Dragon. you say "switch to ".