dictation-toolbox / dragonfly

Speech recognition framework allowing powerful Python-based scripting and extension of Dragon NaturallySpeaking (DNS), Windows Speech Recognition (WSR), Kaldi and CMU Pocket Sphinx
GNU Lesser General Public License v3.0
375 stars 73 forks source link

FYI: dotool #374

Closed johngebbie closed 1 year ago

johngebbie commented 1 year ago

Hi, just wanted to say you might like dotool for Linux instead of xdotool (I don't know about pynput). It uses uinput so it works for Wayland and virtual consoles as well as X11 and you only need one instance.

I've never used dragonfly but I know about it because I maintain numen, which I wrote dotool for.

All the best, John

drmfinlay commented 1 year ago

Hello John,

Thank you for opening this issue. I was not aware of numen or dotool. Your demo video of the former is quite impressive, I must say.

As to Wayland, Dragonfly support for that environment has been requested before. The problem is that its design makes a number of key Dragonfly features difficult or impossible to implement. The most important of these is activation and deactivation of voice commands based on attributes of the foreground window. That said, since many distributions have now leapt onto the Wayland bandwagon, it would be nice to at least offer keyboard and mouse control in that environment. Being able to use voice control in TTYs would also be nice. I'll look into adding keyboard and mouse implementations using your program.

I see on the sourcehut page for dotool that the program only requires users to be in the input group. This makes it more suitable for Dragonfly than other solutions I have investigated in the past. Many of these would have required running user code as root. If I may ask, is this lesser requirement the reason you wrote dotool instead of using an already existing solution?

Best regards, Dane

johngebbie commented 1 year ago

Hi Dane and thank you :)

Yes, not requiring root was a big reason I wrote dotool. Numen used to use a command called ydotool but it required root or patching and had a bunch of glaring issues from sockets to mouse acceleration. I thought it would be easier to write another tool and numen would be less flaky and easier to package.

One thing I'm still trying to find a solution for is how to have dotool work out of the box for Wayland users with non-us keyboard layouts. They currently have to set dotool's keyboard layout to us. EDIT: Sway now works out of the box with non-us keyboard layouts. EDIT 2: Now supports xkb keyboard layouts!

I've also recently been working on gadget which is just like dotool but sends input over USB (video).

I like what you're doing and feel we're on the same mission.

drmfinlay commented 1 year ago

Hi John,

My apologies for taking a while to respond.

I've tried to use ydotool before, but unfortunately could not get it to work. It is good that you decided to write a more sensible program for sending input via evdev as a normal user.

I'd like to write a keyboard/mouse implementation for Dragonfly using dotoolc and dotoold. I could do what these shell scripts do from Python, right? This would be good. Would only have to use one sub-process that way!

Keyboard layout issues are something I came across with other solutions, too. Good to hear that Sway works properly, at least.

Sending input over USB is an interesting idea. I remember similar projects that sent keystrokes via Bluetooth, though I suppose they might be slower. Your demo video for gadget — installing Void Linux by voice alone — is rather impressive! Is numen running on the Raspberry Pi in that video, then?

johngebbie commented 1 year ago

Hi Dane,

No worries, you didn't need to respond even.

You could do what dotoold and dotoolc do with a named pipe but you probably don't need to, and can just keep writing to the stdin of one instance of dotool instead:

import subprocess
import time

dotool = subprocess.Popen("dotool", stdin=subprocess.PIPE, text=True)
dotool.stdin.write("type hello!\n")
dotool.stdin.flush()
time.sleep(1)
dotool.stdin.write("type  bye!\n")
dotool.stdin.flush()

Thank you and yes, numen is running on the Pi, you can switch between controlling the host or the Pi. I should look into Bluetooth.

drmfinlay commented 1 year ago

Hi John,

Thank you for the Python code. I figured it would be something like that. It's a pity that, except for the type command, xdotool wasn't designed to take input in this manner.

Running accurate speech recognition on a Raspberry Pi is impressive. The speech model you're using must be well optimised, then.

johngebbie commented 1 year ago

dotool can now simulate keycodes for different keyboard layouts :)

drmfinlay commented 1 year ago

That's good to hear. I am a Dvorak user. The environment variable solution mentioned in your project README file fixes the gobbledygook for me. It doesn't solve the issue if the layout or variant changes, of course. But perhaps you wish to avoid that extra complexity.

On an unrelated note, have you thought of including some of the --help text for dotool in a manual page? I don't know, maybe there are packaging issues with that. In any case, I think some users would find it useful.

johngebbie commented 1 year ago

Ah good. I'm just going to let the dust settle for now, but layout and variant commands are a possibility. It's never going to be automatic with how dotool is environment independent.

I don't think there's any more to say which would make a manpage useful though honestly.

drmfinlay commented 1 year ago

Yes, adding commands for changing the layout/variant sounds like a good idea to me. An interested user could handle layout changes himself, then, without having to re-run the program. Indeed, and it is better to have it manual than automatic only with X11.

Well, I was just thinking it would be nice to have this information accessible in a manual page, too. No worries if that's a hassle.

drmfinlay commented 1 year ago

I've made some progress on Dragonfly keyboard and mouse implementations using dotool. They will be used only on Linux when an X11 session is not apparent. This will provide basic Wayland (#255) and TTY support. I assume it will permit mouse interaction in Linux TTYs, but I've never had much luck with that normally.

I'm not sure what to do about keyboard layouts other than standard US English. @JohnGebbie, do you have any plans to allow specifying another layout to use via uinput? I can see it is rather involved.

johngebbie commented 1 year ago

Nice, but I'm not sure what you mean by:

do you have any plans to allow specifying another layout to use via uinput?

as uinput has no concept of keyboard layouts. X11 keyboard layouts, TTY keymaps and however a specific Wayland compositor chooses to go about it, are all at a higher level. With the new dotool you can just tell it simulate different keycodes so you can match them up with whatever your environment is expecting.

With Numen, I've just said to set the environment variables: https://git.sr.ht/~geb/numen#keyboard-layouts Alternatively, maybe you could have something like dragonfly.set_keyboard_layout("fr") and set dotool's environment variable yourself.

drmfinlay commented 1 year ago

Yes, sorry, I was thinking about the Linux key codes sent by dotool through uinput. The program produces gobbledegook for me in TTYs because my keymap is the dvorak one. I figured out that dotool's k:N syntax for its key* commands solves my problem.

johngebbie commented 1 year ago

Totally okay, especially because checking it I found something confusing/surprising.

This works as expected:

echo "type ',.pyf" | DOTOOL_XKB_LAYOUT=us DOTOOL_XKB_VARIANT=dvorak dotool

but this seems to just use us-qwerty:

echo "type ',.pyf" | DOTOOL_XKB_VARIANT=dvorak dotool

I will look into this more but very likely have them both act like the top one, like I thought they would. Thank you!

drmfinlay commented 1 year ago

Ah, well I'm glad I could help! I realise now that the names of your environment variables were confusing me. I had thought they would only work with X, since XKB is an X extension. It seems that console key mapping configuration differs between distributions. Some use XKB options for console and X configuration. And some don't. It's not very consistent, but I guess that's Linux for you.

Anyhow, the variables work fine in TTYs, except in the case you mention. It will be sufficient to add options in Dragonfly that set these variables for the dotool subprocess.

drmfinlay commented 1 year ago

Now that I think about it, I've changed my mind on adding implementations for this. It just doesn't make sense for Dragonfly to offer partial "support" in these cases. Your numen system, on the other hand, is clearly designed for more general contextless use on Linux. I'll recommend users who can't use X11 to use numen instead.

johngebbie commented 1 year ago

Fair enough, that does make sense. Hopefully Wayland compositors will make context stuff possible in time. All the best and recommending numen is very kind of you!