Open xngnln opened 9 months ago
Currently, language detection relies on the language tokens that are emitted by the whisper decoder. Only the multilingual model has these tokens. The English model, trained only on English, has no language detection capabilities. So implementing this wouldn't be possible without using an entirely separate language detection mechanism
Now the logic of auto-detect for multi-language is to detect non-English first, and then switch to English if the language spoken is found to be English, while some people are speaking English most of the time and other languages a little bit of the time, is it possible to add an option for the order of auto-detect, so that the user can decide whether to auto-detect English or multi-language first?