ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++
MIT License
34.88k stars 3.56k forks source link

Multilingual recordings: the language used at first leads to a translation, despite another idiom is dominant in the file #1800

Open tripledee opened 8 months ago

tripledee commented 8 months ago

Hi all and thanks Georgi for this incredible work. As a journalist and developer, I write an OSX app to transcribe recordings based on Whisper.cpp. I noticed that when the audio starts in a french and then switch to german, whisper.cpp transcribes the audio then translate it in french, even when the params.language is set to "de" and the params.translate and params.detect_language are both set to "false". So it adds translation errors to transcription errors, and the result is sometimes terrible. I have ideas to deal with that (inserting some 30sec recording in the desired language at the beginning of the audio file, works but is inefficient and adds a load on the MBP battery). My question: why does Whisper.cpp has this behaviour when another language is set before transcribing? Is it a bug or do I misunderstand something?

mrfragger commented 8 months ago

whisper isn't set up to deal with dual languages back in forth in audio at all. You will get repeats and terrible results like you mention. For those you want an English transcription you can set the language to say French (assuming they're speaking French mixed with English) and then -tr (to translate to English). Only translates to English not any other language. This for the most part deals with the repeats but not always.

Now when I set it to English and it detects French then sometimes it'll output French. But the transition between them isn't smooth and it's a toss up to when it works correctly. It's almost as bad as silence for being the culprit for hallucinations (repeats).

Ideally you would have 001 de, 002 fr, 003 de, 004 fr, 005 de, 006 fr audio segments that could be transcribed separately and then merge the transcribed subtitles together. However, that's rarely the case.

tripledee commented 8 months ago

Thanks for your answer. I tested another implementation of Whisper as a MacOS app (Whisper Transcription), and the mixed french/german is transcribed in German. So the developer may have set a strategy to deal with whisper. What I don't understand is that despite the detect_langage param is false, there still is language detection occurring