braden-w / whispering

https://whispering.bradenwong.com/
MIT License
188 stars 21 forks source link

inconsistent punctuation and capitalization #51

Open swimJim opened 11 months ago

swimJim commented 11 months ago

Hello, I have been using your Software For disability reasons because I Have difficulty typing This is way more accurate than naturally speaking The issue I wanted to report is that sometimes The whispering output has Punctuation and Capitalization, that's really strange. I Mean to say that it skips out on punctuation altogether sometimes I know this is not a fault of your software but of Something to do with the whisper technology itself What I was wondering is Is there a way you could add a feature where we have the option to use our open AI AI API key to Automatically run the transcribed text output through a Chat GPT instruction something that tells it to format the punctuation in a certain way or Potentially for fun just convert anything you say into whisper into pirate speak for instance It could be useful and entertaining Please note that I this is the first time I've ever posted anything on github So I don't know if the issues page is the appropriate place for this comment But thank you for reading it.A slight edit, it would be great if this post process with custom chat GPT instruction were an option that you could check or uncheck to save money with your API key.

braden-w commented 10 months ago

Hi @swimJim,

First of all, my sincerest apologies for the delayed response. I've had some personal family matters to attend to and the start of the school year has kept me quite busy. I appreciate your patience and want to assure you that your feedback is incredibly valuable to me.

I'm really glad to hear that you find the software helpful for your needs! Accessibility is one of the priorities of this project, so it's gratifying to hear that it's serving its purpose in that regard.

Regarding the inconsistencies in punctuation and capitalization in the Whispering output—I agree that this is more of an issue with the underlying whisper technology than with the software itself. I am currently working on a feature in another project that will enable parsing of transcriptions, and your suggestion could blend in nicely with that. The idea about an optional toggle to "pirate-speak-ify" the transcription is fun! Not only could it serve as an entertaining feature, but it could also open the doors to other creative and fun applications for the software. However, it will take a while for my second project to release, especially with my other commitments. If I had post-processing, I agree there definitely should be a toggle for this.

swimJim commented 10 months ago

I'm glad you're so open to feedback. I understand that Your schedule is busy I Can only see your profile picture in a small way, but it almost looks like you're at a graduation Again, I didn't zoom in but I'm seeing a graduation cap Let me zoom in. Oh I see that I was wrong. But I certainly do hope you have a graduation cap in your future seeing as you started school. Thank you for your work on the project. I have one or maybe one or two more suggestions that might be of note. And instead of bringing them up here, I'll start a different issue item to keep everything more organized.

cgbur commented 9 months ago

There is also the ability to specify a prompt to the whisper model. I think that would be good to be able to customize as well. Adding a filter at the end using a custom ChatGPT prompt and model selection sounds like a neat feature as well!

WebPam commented 8 months ago

Hello, thank you very much for your wonderful work, it has made my life different, really, thank you ! <3 As said above, it would be wonderful to be able to add custom prompt. Do you to integrate this feature ? I would love to help you. I am not a great developer, but now, with AI, I can do quite a lot of things. Please, get in touch if you have the opportunity, to see what can be done. All the best, thank you very much. :)

doxgt commented 2 months ago

This is where the clipboard workflow is super-conducive on Windows. I do the the following, incorporating Push-To-Talk:

So far above workflow is quite satisfactory.

However, I wouldn't mind an option, which I think is possible with offline Whisper instances, to force punctuation one way or another, so I can more consistently enunciate my own punctuations all the time - this is generally how folks dictate on Dragon.

The problem with Whisper's auto-punctuation is that when it works it works well. But when you "pause and compose", it tends to falter. Yes in theory one can "pipe" the output through a GPT to fix things. But why bother if you don't have to?

doxgt commented 1 month ago

@swimJim mainly: take a look here my friend: https://github.com/doxgt/PlayGround/blob/main/GPT_cURL.ahk

Using cURL and phiola, I have been able to write a Whispering equivalent - on Windows only of course, with AHK (v1 or v2, take your pick).

Because the "barriers to entry" on AHK, cURL, and phiola are so low, you could do it, too. If you run into any issues, go to the friendly AHK forum, where I am sure you will more than likely get your questions answered.

Whispering is amazing because it is cross-platform. But I don't need cross-platform myself. AHK+cURL+phiola gives such flexibility and ease of usability on Windows, it is worth sharing with the enthusiasts here.