DestinyItemManager / DIM

Destiny Item Manager
https://destinyitemmanager.com
MIT License
2.07k stars 644 forks source link

Add voice interaction capability #8655

Closed mlsof21 closed 1 year ago

mlsof21 commented 2 years ago

Proposed change

I'd like DIM to have the ability to move items to/from my current character without using mouse and keyboard. This would be done with a keyboard shortcut that would allow DIM to use the user's microphone to interpret a command, then perform that command.

There's currently a Chrome extension, Voice DIM that does this via a global shortcut and the Web Speech API (only works in Chrome currently).

Since Chrome (and Edge, I think) are the only browsers that support Web Speech, there would probably need to be another service that is utilized to obtain transcriptions. Early research shows that AWS Transcribe has a polyfill library to fallback if there's no support for Web Speech.

How does this fit into your workflow?

I find it really useful to grab weapons for bounties (or a missing weapon during a raid encounter) without having to Alt-Tab out of the game.

bhollis commented 2 years ago

The Chrome extension is super cool. I think the hard part here is actually the global shortcut which AFAIK we can't do without an extension, which means the Voice DIM extension might actually be the right implementation here.

mlsof21 commented 2 years ago

Yeah, that's pretty much my determination as well....unless you want an always listening mode, which is probably not advisable haha.

robojumper commented 2 years ago

Even if some external extension or software is needed to have the global shortcut for voice interaction, there's still ways in which DIM could facilitate the implementation, and we're already basically halfway there already: There's a Stream Deck integration where DIM's actions can be automated through a WebSocket connection to a locally running software. If this approach could be re-used for Voice DIM (e.g. generalizing stream-deck to automation or something), then Voice DIM wouldn't have to automate DIM's UI and could instead automate the actions themselves.

The architecture for Stream Deck is basically

Stream Deck hardware device <-USB-> Stream Deck client software (<--> internal DIM plugin) <-WebSocket-> DIM

and Voice DIM could work similarly

Microphone <-Audio APIs-> Voice DIM client/extension <-WebSocket-> DIM

mlsof21 commented 2 years ago

Interesting. If I'm understanding the stream deck plugin, it spins up its own WebSocket server, then DIM connects to that server conditionally (plugin enabled obviously). On a message built by the plugin, DIM will receive/handle that accordingly.

I'm not quite sure how to spin up a websocket server in a chrome extension (unless I'm misunderstanding how to accomplish this), but I can research this a bit to see if it's feasible.

mlsof21 commented 1 year ago

After some research, the only real way to use the global shortcut would be to have either a local websocket server that handles sending commands to DIM, or have a cloud-hosted websocket server that does the same thing. The differences being how much is required of the user:

Another potential implementation would be an Always Listening/Hands-Free mode. I've put quite a bit of work into a branch of the extension right now, and it works really well. It requires a user-set activation phrase (maybe make this a concrete setting as some user phrases could be ambiguous or hard to parse) for the extension to start parsing a command. In DIM, this should definitely be a toggle-able option.

An issue in general with the WebSpeech api is that only Chrome and Edge support it fully. Firefox disconnected their proxy server years ago, and most other browsers error out after giving mic permissions. A solution for that would be Speechly. They have a polyfill library for WebSpeech that works well for all browsers (at least the ones I've tried, Firefox, Opera GX, Vivaldi). Azure Cognitive Services also has a polyfill/ponyfill library as well, but I haven't tested it. Of course, both of those options are also pricy.

I'm currently working on a fork with always listening on. I can provide more detail as I get it ironed out.

bhollis commented 1 year ago

Let us know any progress you make! I'll close this for now as there's nothing specific the DIM team needs to do.