cboiangiu / decky-dictation

Allows speech to text input using Vosk and Nerd Dictation.
Other
11 stars 7 forks source link

Customize hotkeys? #7

Open wallentx opened 5 months ago

wallentx commented 5 months ago

I'd love to use this, but L5 and R5 are bound for other usage for most of my games. Steam+L5 and Steam+R5 maybe?

cboiangiu commented 5 months ago

This is currently on the Nice to have list, but would require implement persistent config settings as a prerequisite. Currently the main focus is getting the latest release up in the decky store prod, but I am open to PRs meanwhile. Also I would prioritize fixing #8 first.

wallentx commented 5 months ago

Rgr. I was able to set Steam + L5/R5 on a local branch (I cloned the repo on my deck and installed the plugin from a zip), and have been trying to use it as a normal thing on my Deck. I want to actually try and mess with this a bit more on my regular Arch desktop so I can dig into the code a bit easier, and also understand the behavior a bit more. I actually think this could make a pretty cool desktop app if it's lightweight and simple, as I've used stuff like https://github.com/Melvin-Abraham/Google-Assistant-Unofficial-Desktop-Client before to try and leverage a fairly well trained model to do STT.

I've been testing this while playing Battlebit Remastered, and I've noticed that it will work fine, but after playing for a little bit, it seems to just stop working. I've been ssh'ed into my deck and tailing journal logs to try and capture any noteworthy events, but nothing has jumped out at me. I'll keep searching to see if there's something that I'm doing, like opening a menu, or opening the Steam controller config page, that could be causing it to stop working.

During the time that it was working, I did notice a few usability things that might be useful for feedback:

  1. When I hit the begin speech button, it seems that there is a timeframe in which I need to start speaking, or else STT capture will quit, but I couldn't find this value in the code when skimming. What is this timeout?
  2. Since there does seem to be a timeout, it sort of makes the ending speech button redundant.
  3. If there were some OSD indicator, like say a microphone or something to represent active STT capture, that would handle the "communication" to the user that STT timeout had happened, or that it had started.
  4. If the above is done, then the ending speech button could be repurposed to execute some action, such as send a return key press.
  5. I'm not sure if this is due to the Steam Deck notification chime, or some internal button press noise that the microphone picks up, but STT seems to often detect the word huh being said a lot, even when nothing is being said.
  6. I'm not sure if it is possible, but lowering the output volume when STT is active might reduce game noise from getting picked up by the mic, unless the Deck already has some echo-cancelling type mechanism in place.

I'm curious what you think about any of these, and I'm happy to try and take a crack at some of the stuff if it seems to be of value.

cboiangiu commented 5 months ago

Not sure why it would stop working. Some logs (/tmp/decky-dictation.log, /tmp/decky-dictation-std-out.log, /tmp/decky-dictation-std-err.log) would help to diagnose this issue, although I haven't encountered this while using the plugin. Maybe something goes wrong after long periods of time or maybe nerd-dictation just stops working.

I think that implementing custom hotkeys would be the best solution. Changing the default ones would do no good as everyone has their own preference. If you would like to take a stab at it, I would gladly help you along the way. Take a look at SettingsManager to handle the settings persistence. Not sure if there is a way to do this directly in the frontend, but taking a look at other plugins should help answer this question.

  1. --timeout 4 is set here. Take a look at nerd-dictation for the available args. The timeout was set in case the user forgets/misses the hotkey/s (initially this was a combo). Communicating the activity status of STT more reliably would maybe remove the need for this. (this would require some investigation regarding the usage of nerd-dictation and communicating it's activity back to the frontend to handle the toast notification/s, also see 3.) Adding it under the settings section (active + timeout_value) would also help.
  2. Not necessarily since other sounds might keep the STT active. Pursuing a timeout driven experience would lead to very unreliable results in my opinion, although this could be added down the road.
  3. Maybe check in the backend if the nerd-dictation process ended without the user requesting end would serve as the trigger. Pooling from frontend every 500ms to change STT status while listening? Also changing to one notification would be nice. (dismissed on timeout or user end).
  4. There was some talk about a single button toggle. This is also on the Nice to have list.
  5. Could be. Haven't encountered this myself.
  6. As far as I know, the Steam Deck already does this by itself for the microphone.