Const-me / Whisper

High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model
Mozilla Public License 2.0
7.67k stars 665 forks source link

Consider adopting this project's UX to allow dictating anywhere #147

Closed santiago-afonso closed 11 months ago

santiago-afonso commented 11 months ago

This is an awesome project and I use the software all the time to improve my productivity, but writing to a .txt severely limits its usability - I have to keep Notepad++ open and wait for it to refresh to see the dictation results. I'd be a major boost if it allowed to write to any text field like this project does: https://github.com/savbell/whisper-writer (just the gif at the top of the readme sells the idea). Sadly, I lack the skills to help implement it, but nevertheless I wanted to give you the idea.

And thanks again!

emcodem commented 11 months ago

Ah, a text file is the most easy "API" ever... well ok it's not advanced so just "PI"... ok its not really realted to programming in first place, so it's just "I". Text files are the most simple and robust Interface that exist ;-) Our good Const might now laugh at me and think "nah, COM is much better and flexible too" - i agree but as we basically only want text, a text file is not a bad choice at all, and also it is cross platform compatible hehe

Example script, paste this code to script.ps1. Start transcription and open powershell prompt. In powershell, specify path to test.ps1 and path to your txt transcript, e.g.

c:\temp\script.ps1 c:\temp\transcribed.txt

$file_to_watch = $Args[0]

[void][System.Reflection.Assembly]::LoadWithPartialName('System.Windows.Forms')

Get-Content $file_to_watch -Wait -Tail 1 |
  ForEach-Object {
    $text_with_newline = $_ + "`r`n"
    Set-Clipboard -Value $text_with_newline
    [System.Windows.Forms.SendKeys]::SendWait("^{v}") 
  }

Above code works and might even be adequate for your specific usecase. Once the script is running, it waits for new lines in the specified text file and it will copy it to clipboard and emit CTRL+V keys to windows which is basically what you want.

The problem is that this type of application requires a lot of engineering. Even the very old but still alive dragon naturally speaking does basically a pretty bad job on inserting text to the cursor, there is a lot of room for improvement. This means from my perspective that such a tool should be developed totally independent of any whisper or AI stuff. Maybe it is more something that the SubtitleEdit guys can imagine as a good feature for their stuff.

santiago-afonso commented 11 months ago

I didn't realize it was that complex. But thanks a lot for the script and your time!!!