Whisper is surprisingly good at zero-context transcriptions most of the time.
Yet sometimes it fails surprisingly, especially on my girlfriend's phone. Maybe her microphone isn't as good, but it requires perfect enunciation to guess the language correctly.
Since most people use one language, it makes sense to let people configure what language that's going to be. The user already can pick a /voice, so maybe that language can be used.
I'd like to see if there is a way to improve anonymous transcription though. Maybe the prompt param can be used to prime multiple languages like "This speech is probably either in English or {a guess at user language}". Though I have no idea if even a prompt like "This speech is English" sways transcription. Just something to try.
Whisper is surprisingly good at zero-context transcriptions most of the time.
Yet sometimes it fails surprisingly, especially on my girlfriend's phone. Maybe her microphone isn't as good, but it requires perfect enunciation to guess the language correctly.
The Whisper API lets us pass a
language
param like "en" or "es" (https://platform.openai.com/docs/api-reference/audio/create#audio/create-language) but this can't be used unless of course we know for a fact the user's voice memo is in that language.Since most people use one language, it makes sense to let people configure what language that's going to be. The user already can pick a
/voice
, so maybe that language can be used.I'd like to see if there is a way to improve anonymous transcription though. Maybe the
prompt
param can be used to prime multiple languages like "This speech is probably either in English or {a guess at user language}". Though I have no idea if even a prompt like "This speech is English" sways transcription. Just something to try.