Cool! - Githubissues

johngebbie commented 1 year ago

Sorry this isn't an issue but this is really cool. I've been writing something kind of similar but for people who can't use their hands.

I'm not sure how you've simulated input, but I wrote a tool called dotool so mine could work in X11, Wayland and TTYs, which might be helpful.

aj3423 commented 1 year ago

@JohnGebbie Thanks, Numen seems so accurate and responsive. I also use VOSK, but I never get that short response delay and accuracy like "change serial port baud rate" and "oh no oh no oh dear", is there really such big difference between use native code and use a server... In my case it waits at least 1~2 seconds to return the result for most sentence.

About the input, I was intend to support complex phrase like "do something A do something B do something C", so it can be used as a "macro" and bind to some short alias. And the "uinput" used in dotool is linux specific, why not use cross platform library like robotgo.

johngebbie commented 1 year ago

You're welcome, I'm not sure about the difference with a server, never tried it, but I remember it was much slower when I used the big language model, instead of vosk-model-small-en-us. Maybe try numen though, it does feel about a second.

robotgo looks cool but I don't want to be tied to a graphical session, I use numen in the virtual consoles a lot, even just to log in, and don't don't want something that depends on scraping applications. I've got a raspberry pi mode in the works, where the pi appears as a input device to the computer it's plugged into, which lets me use other computers and change operating system etc., and it wouldn't be possible otherwise. Numen is just like a simple voice keyboard.

aj3423 commented 1 year ago

Understand.

I tried numen with the "change serial port baud rate" couple times, the result:

chanda serial port bought the right
changes siro pot border rate
changes zero pot border rate 
change siro bought bought a rate 
change serial port bought a rate 
change serial port bother it

Seems it's caused by my pronunciation :-)

How do you handle something like njson.MarshalIndent? I can't come up with a perfect solution, I mean it's too slow to say it letter by letter, so I tried to add lots of golang specific words to the dictionary, like the unmarshal indent , but there're too many words to add. I thought about scan directory go/src and add all the words to the dictionary, but that may introduce conflict problem like "see" and "sea". How do you type long variable names like that?

johngebbie commented 1 year ago

Ah dang. It's beside the point, but the "change" phrase gives you a little menu of alternatives.

I just use a lot of autocompletion. In vim you can press Ctrl+p to autocomplete what you're typing to something you've already written, so I say "troll pit". Then there's Ctrl+x Ctrl+p that lets you keep auto completing what was after the match, which I've rebound it to Alt+p so I say "hype pit".

So it'd be something like:

"nerd jury" : nj "nerd jury troll pit" : njson "nerd jury troll pit hype pit" : njson.MarshalIndent

You can also repeat Ctrl+p to pick the next match further up the buffer, so sometimes I use my repeat phrases like "double" and "handful".

I sometimes autocomplete function calls and stuff as well with "hype pit handful again" and the like.

Hope that kinda made sense.

aj3423 commented 1 year ago

Yeah, that's the best solution so far. But mapping "pit" to "p" slows down the input a bit and results in a learning curve, this is why I didn't use TalonVoice at the first place. I know very little about AI but it's so developed nowadays, somehow it must be capable of identifying just letters, maybe a dedicated model for letters only, so user can simply say "n j s o n" quickly. I tried to do that by limiting VOSK's word_phrase to a~z only, but some letter like "a" and "h" still conflicts especially when speaking fast. But I still think an improved AI model is the final and only solution, do you know any AI model or engine that capable of this?

johngebbie commented 1 year ago

No I don't know any. I only tried open source speech recognition and vosk was leagues better than pocketsphinx and julius. There's maybe doing kaldi stuff yourself or your own model, but I really don't know.

I don't see how "pit" slows down the input, "pit" and "p" are both a syllable?

I get you, I read this on a post about Talon: "It isn’t realistic to think that you will memorize every possible command" And noped out of there. But I'm happy with numen's phrases, there's about 130 and that's that, I don't expect you to add more.

I think the real complexity is Talon is trying to be your text editor/window manager/assistant with fancy grammars and application specific rules.

aj3423 commented 1 year ago

Maybe it‘s also caused by my pronunciation that when I say “p“ 10 times is slightly quicker than 10 “pit“, never mind :-)

I also tried full joycon button typing, I mean use 26 buttons for all the alphabets, which can be done by using 2~4 modes, each for a group of alphabets, but that‘s also too idealistic, it‘s not that efficient and has some signal conflict problem.

Talon and us, we are trying to solve same problem in different ways and finally got a same bottleneck. I‘m trying something else, I got a wireless split keyboard, now I‘m trying to bind it to my trousers with velcro:-) Hope there will be some lightweight, soft, foldable split keyboard on the market, and an eye tracking glass.

johngebbie commented 1 year ago

I'll close this but feel free to email me or whatever whenever. It was nice talking :)

aj3423 / joy-typing

Cool! #1