Open alex-crr opened 3 weeks ago
As of my knowledge Whisper only supports setting either a single language fixed or be open for every language. For your usecase it needs to be restricted to two languages, but that's not possible afaik, so currently you'd need to either decide for french or english or use the multilingual model that might misunderstand sometimes. Maybe another user has an idea?
As of my knowledge Whisper only supports setting either a single language fixed or be open for every language.
What KoljaB says is correct. However even if you set it to a specific language it can still detect other languages. I'm not sure about large sentences, but in my experience, setting the language to Greek and speaking to it in Greek, it can still correctly detect English words I throw in here and there.
Note however to make sure to use the regular whisper models (medium
, large-v2
, large-v3
, etc) or the faster whisper models (Systran/faster-whisper-large-v3
, etc). Using the distil versions (e.g. Systran/faster-distil-whisper-large-v3
) it will auto translate to English.
An idea i had, will be running two instances, one for english another french ?
That's interesting, however the model aims to understand whatever you say to it in' the language specified, so you'd have one correct french translation and a very wrong English one so you'd probably need to run some kind of layer to select which one makes more sense.
That's interesting, however the model aims to understand whatever you say to it in' the language specified, so you'd have one correct french translation and a very wrong English one so you'd probably need to run some kind of layer to select which one makes more sense.
You could have a toggle to mute the one and unmute the other. But you'd have to have them both loaded in memory and depending on model size and hardware constraints it could be tough.
Another possible idea could be loading a single multilingual model in a server like setup, and then be able to query that server with two client scripts, each with different language configs. Although I'm not sure if the recorder configs can be changed on the fly without having to reload the model.
EDIT: From the server/client README
it looks like the 'language'
argument is part of the server config, so you can't query the same multilingual model (on the server) with two different language clients. You'd need to set up two servers instead.
So I'm building an assistant with which I'd like to be able to speak both in french and english (mainly because the models have trouble understanding my accent). However it sometimes Understands me as speaking portuguese Which Is unfortunate