enricoros / big-AGI

Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.
https://big-agi.com
MIT License
5.64k stars 1.3k forks source link

RFC: Whisper integration #43

Open peperunas opened 1 year ago

peperunas commented 1 year ago

I propose integrating OpenAI's Whisper Automatic Speech Recognition (ASR) system [GitHub]. Whisper is designed to convert spoken language into written text.

Is it something that might be of interest?

enricoros commented 1 year ago

It's interesting, but this will require quite some backend work. We have support for microphone, and works not bad at all. I prefer it to the system speech recognized in windows.

I'm gonna park this for now and close it, as it will probably make the app much bulkier and different. Will reopen if things change :)

ER-EPR commented 9 months ago

whisper is supported in Localai, I hope a upload and transcribe function can be implemented. API is well documented in https://platform.openai.com/docs/guides/speech-to-text/prompting , and how to separate long audio to less than 25MB chunks is also in the doc. As for now I can only use command line or python notebook to do the work.

rdewolff commented 7 months ago

Could we reopen this? I think Whisper from OpenAI is easy to integrate and would not change the UX. Same recrod button. The key is the amazing accuracy. I switched to using Whisper to handle my mac and won't look back.

enricoros commented 6 months ago

Could we reopen this? I think Whisper from OpenAI is easy to integrate and would not change the UX. Same recrod button. The key is the amazing accuracy. I switched to using Whisper to handle my mac and won't look back.

ok