C-Nedelcu / talk-to-chatgpt

Talk to ChatGPT AI using your voice and listen to its answers through a voice
GNU Affero General Public License v3.0
1.97k stars 333 forks source link

Add OpenAI Whisper API for improved speech recognition #94

Open Dramaguy opened 1 year ago

Dramaguy commented 1 year ago

hello there @C-Nedelcu , thank you for creating the extension! I recently started using the "talk-to-chatgpt" chrome extension and found it very useful for me. Sometimes, the built-in speech recognition functionality was not very accurate, which made it difficult to use effectively. I would like to suggest adding support for the OpenAI Whisper API for speech recognition. I think it would be a valuable addition to the extension and would make it even more useful for users. Thank you for considering my suggestion.

C-Nedelcu commented 1 year ago

Thanks for your kind words and for your suggestions.

Funnily enough, I always thought the speech recognition in Chrome was quite excellent. I've never really had an issue with it to begin with. Are you having trouble with it?

The thing I'm worried about is that currently the Chrome API for speech recognition is quite good and fast - works almost instantly. Adding support for an external API might slow things down a great deal. But still, I'll look into it.

Dramaguy commented 1 year ago

Yeah, the speech recongnition frequently made troubles in my experience.

I thought the reason could be my accent, but I am not sure. It didn't happen when I used Whisper.

Thanks for replying!

steinhaug commented 1 year ago

There are several different models that would be nice to try out, so if you could set up some integration points here that would be awesome. I see that I could set up my own install of whisper, and there are others aswell, Ill be looking into this and post here if I get them up and running.

Basically the endpoint already for 11labs should be "configurable" for pretty much any other site now that you made that one :D

I do agree on the speed, however for Norwegian it sucks. Also the text to speech sucks for norwegian so would be extremely nice to try something else!

7k50 commented 11 months ago

Thanks for your kind words and for your suggestions.

Funnily enough, I always thought the speech recognition in Chrome was quite excellent. I've never really had an issue with it to begin with. Are you having trouble with it?

The thing I'm worried about is that currently the Chrome API for speech recognition is quite good and fast - works almost instantly. Adding support for an external API might slow things down a great deal. But still, I'll look into it.

The speech recognition in Chrome is quite alright, but Whisper is on another level: it handles punctuation, advanced terminology and ambiguity very beautifully (even better than humans according to some studies), and it is quite fast although Chrome's might be faster. I believe enabling the Whisper API (via an OpenAI key) could potentially allow the plugin to work in other Chromium browsers like Brave(?).