danneu / telegram-chatgpt-bot

a Telegram ChatGPT bot that supports text prompts and two-way voice memos
35 stars 8 forks source link

Improve Whisper transcription #5

Open danneu opened 1 year ago

danneu commented 1 year ago

Whisper is surprisingly good at zero-context transcriptions most of the time.

Yet sometimes it fails surprisingly, especially on my girlfriend's phone. Maybe her microphone isn't as good, but it requires perfect enunciation to guess the language correctly.

The Whisper API lets us pass a language param like "en" or "es" (https://platform.openai.com/docs/api-reference/audio/create#audio/create-language) but this can't be used unless of course we know for a fact the user's voice memo is in that language.

Since most people use one language, it makes sense to let people configure what language that's going to be. The user already can pick a /voice, so maybe that language can be used.

I'd like to see if there is a way to improve anonymous transcription though. Maybe the prompt param can be used to prime multiple languages like "This speech is probably either in English or {a guess at user language}". Though I have no idea if even a prompt like "This speech is English" sways transcription. Just something to try.