SemlerPDX / OpenAI-VoiceAttack-Plugin

The OpenAI VoiceAttack Plugin provides a powerful interface between VoiceAttack and the OpenAI API, allowing us to seamlessly incorporate state-of-the-art artificial intelligence capabilities into our VoiceAttack profiles and commands.
GNU General Public License v3.0
33 stars 6 forks source link

How to setup Dictation with Whisper #5

Closed santiyounger closed 1 year ago

santiyounger commented 1 year ago

Hey loving your work, than you so much for sharing this project!

I installed the plugin and installed the AVCS profile too.

Everything is working fine, however as a beginner even after reading the manual I can't figure out how to create a new command to dictate using Open AI's whisper.

Ideally I was hoping there was a pre-made command to achieve this, however I couldn't find one in the ones you created, however I could be wrong.

I see how it makes sense that there's a way to record the audio and send that to whisper and then it deleted the audio, but I'm lost on how to actually set this up step-by-step as a beginner.

Appreciate any help, excited to start using it.

Thank you!

SemlerPDX commented 1 year ago

One of the challenging things I programmed into this plugin was a way for the ChatGPT plugin context to offer a GetInput phase which can take advantage of an advanced dictation-until-silence method and stitch together any audio files produced in order to simplify the subsequent Whisper API call to a single audio file transcription/translation. Because my code base is open source, you could create an inline function in C# inside a VoiceAttack command where you could try to adapt or approximate my methods of accomplishing my goal - attribution and reference to source would be appreciated in the comments. These methods are in the service class "Dictation.cs"

SemlerPDX commented 1 year ago

The way this class is called is from within the ChatGPT context code starting at line 541:

// Get User Input from variable, dictation token, or begin listening for dictation:
userInput = Dictation.GetUserInput(useWhisper, sayPreListen);

// If processing dictation audio with OpenAI Whisper transcription/translation:
if (useWhisper && !String.IsNullOrEmpty(userInput) && userInput != DEFAULT_REPROCESS_FLAG)
    userInput = Whisper.GetUserInput(operation)
Robertsmania commented 1 year ago

I think the OP is looking for a way to setup commands within VA rather than getting that deep into the plugin code?

In my adaptation, I have a plugin context "Transcribe_Audio" that handles getting the dictation recording, sending it to Whisper and setting a ~~Transcription_Response VA text variable with the response.

Say, 'What would you like to say?'  (and wait until it completes)
Execute command, 'A2 TwtichChat -Transcribe Audio' (and wait until it completes)
Begin Text Compare : [~~Transcription_Response] Does Not Equal ''
    Execute command, 'A1 - SpeechCoordinator - Send_iRacingChatMessage' (and wait until it completes)
Else
    Say, 'I didn't get that.'  (and wait until it completes)
    Write [Red] 'No dictation transcription?' to log
End Condition

Does your GetInput phase have something similar they can use?

santiyounger commented 1 year ago

Thank you @SemlerPDX and @Robertsmania

Appreciate the help! And yes @Robertsmania is right, unfortunately I'm a complete beginner when it comes to programming and code, so I'm quite lost on how to simply use whisper dictation for my own use, I don't want to alter your method or create my own code, I just want to use the dictation method with Whisper.

@Robertsmania your response was helpful 🙏 but I still don't fully know how to set that up in VA correctly

Just for clarity @SemlerPDX. All I want is a way to dictate in Voiceattack using Whisper so that I can have a transcript I can insert into a writing app.

I couldn't find any pre-made command that did this in the plugin nor in the AVCS profile. Perhaps there is one I just couldn't find it.

Would it be possible to have a command I can download/import and use as a normal user?

Ideally installing a command in VA for simple use would be great, but if that's hard on your side. Appreciate instructions to do it myself assuming I'm a 5 year old non-programmer hahaha

Thank you @SemlerPDX and @Robertsmania

SemlerPDX commented 1 year ago

Thank you @SemlerPDX and @Robertsmania

Appreciate the help! And yes @Robertsmania is right, unfortunately I'm a complete beginner when it comes to programming and code, so I'm quite lost on how to simply use whisper dictation for my own use, I don't want to alter your method or create my own code, I just want to use the dictation method with Whisper.

@Robertsmania your response was helpful 🙏 but I still don't fully know how to set that up in VA correctly

Just for clarity @SemlerPDX. All I want is a way to dictate in Voiceattack using Whisper so that I can have a transcript I can insert into a writing app.

I couldn't find any pre-made command that did this in the plugin nor in the AVCS profile. Perhaps there is one I just couldn't find it.

Would it be possible to have a command I can download/import and use as a normal user?

Ideally installing a command in VA for simple use would be great, but if that's hard on your side. Appreciate instructions to do it myself assuming I'm a 5 year old non-programmer hahaha

Thank you @SemlerPDX and @Robertsmania

No worries, if this sort of thing is a bit too technical, I can help. I see what you mean now, and what you want this for. I can probably whip up some sort of command system that will work for you, I'll just modularize my Dictation class in this plugin into an inline function for you as a more advanced 'dictation until silence' which you can use in a writing app.

Won't need instructions beyond knowing how to call the command, and that when it is finished, a bunch of key presses will occur, or a 'paste' command, not sure yet but I'll write it up and see how it turns out.

santiyounger commented 1 year ago

Thank you, you are the best, a command I can import into VoiceAttack would be truly appreciated 🙏

If you ever need exposure with your projects on my YouTube channel or Twitter let me know

I recorded a short video for you with some examples on how I've though of this command to work, in case it's helpful on your side

quick video on VoiceAttack AI Plugin and Whisper | Drobpox Video

Thank you!

Robertsmania commented 1 year ago

Exposure is always good! Here's an example of how I'm using the Whisper dictation to isolate viewers in my Twitch chat: https://www.youtube.com/watch?v=4Q3ydrvN8tI And another (slightly clumsy) example when I was still just getting it to work but shows a practical application of banning a spam bot that came into my channel offering to sell followers: https://www.youtube.com/watch?v=hfiRjBldgAI

Its always been hard since the screen names are usually spelled with weird characters and sequences that do not match typical phonetic patterns. And in the past, the native dictation accuracy for anything was so poor, it was pretty much a lost cause.

Now with the much higher accuracy of the Whisper model and the ability to pass in a prompt that includes all the unusual spellings of screen names and such, I can actually get very reliable results.

In the use case above its actually a two step process. I send the audio from my dictation to Whisper with a prompt that includes all the screen names and the last several messages from the twitch chat. That returns text with what it thinks I said, and if it matched someone's screen name it usually has it spelled the way they have it.

Then I build a prompt to send off to ChatGPT including the dictation response as the 'search term' and give it similar information about the current users in chat and recent messages for context. It looks at that data and responds with who it thinks is the most likely user.

  {
    "role": "user",
    "content": "Help me identify a Twitch chat viewer from this search term: \"some form of greetings\". The search term may be similar to a username or reference something that was said in chat. Please help idetify the most appropriate username. If its unclear please say so and do not make up an answer. Keep your reply very brief, no need to explain how you got your result or repeat the search term. All the current usernames: Robertsmaniac. Recent Twitch Chat messages formatted as (username: content): (Robertsmaniac: greetings)"
  }

Then I take whatever ChatGPT responds and search the list of actual viewers in chat to see if there is a match.

The truth is I actually check just the whisper response first and the accuracy is so good I nearly always find a match if the search term was reasonable. But I send it off to ChatGPT anyway just for the extra flavor and comic relief. And indeed, sometimes the AI is able to find the right user even when the direct text comparison did not.

Once you have reliable and accurate dictation, lots of opportunities open up!

SemlerPDX commented 1 year ago

Thank you, you are the best, a command I can import into VoiceAttack would be truly appreciated 🙏

If you ever need exposure with your projects on my YouTube channel or Twitter let me know

I recorded a short video for you with some examples on how I've though of this command to work, in case it's helpful on your side

quick video on VoiceAttack AI Plugin and Whisper | Drobpox Video

Thank you!

Okay, I think I understand your goal better now. If you just need a Dictation Mode typing assistant which uses more accurate transcription than Windows Speech via the OpenAI plugin and the Whisper context, it can all be done with standard command actions in VoiceAttack and calling the OpenAI plugin as normal using the Whisper.Transcribe context.

I've tossed together an example profile you can use, and/or modify as needed to better suit your individual needs. It works by having the user say a starting voice command, such as, "Start Dictation" - while you speak, your dictation creates tiny audio files which are sent to Whisper for transcription, doesn't matter if you pause in your speech or use one long run on sentence followed by more pauses and more speech. It will keep gathering (in order) the things you say, send them off to Whisper for transcription, all in the background while Dictation is enabled - when you are done, you'd use another voice command to end it, such as, "End Dictation" ... final processing will occur (if not already done), and then will paste the combined transcriptions to the cursor location.

Edit the two actual voice commands as needed, avoid altering function commands unless you are familiar with what they are doing and understand how to modify them. If you have any questions, or find a bug that I didn't during my testing this afternoon, let me know here and I'll help out.

Download Here:

OpenAI Whisper Dictation Example-(v1.0)-Profile.vap

SemlerPDX commented 1 year ago

I will most likely make this example profile above into a proper AVCS public profile, but just wanted to toss together the concept and get it to you early so maybe you could provide feedback. If it's good to go, it will become AVCS TYPE v1.0 soon.