SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2
MIT License
11.33k stars 946 forks source link

Funny and revealing hallucinations #949

Open AgatheBauer opened 1 month ago

AgatheBauer commented 1 month ago

While building a GUI, I stumbled across some funny and revealing hallucinations. The GUI contains a microphone mode. While saying nothing or something completely different, I'm often getting transcription snippets like this one:

"Untertitel im Auftrag des ZDF, 2020"

ZDF is a German television channel. Looks like the model was trained with data from it... :D

Is there a way to fix this?

If anyone is interested, I attached the code. Completely made with GPT4o: Main.zip

MahmoudAshraf97 commented 1 month ago

This is a problem in whisper itself, it happens in all language due to issues in the training data, the solution is to use the vad filter

x86Gr commented 1 month ago

@AgatheBauer You should probably read the discussions on the whisper repository, it's a known glitch. Do you have a list of them?

aligokalppeker commented 1 month ago

You can reduce hallucinations by tuning parameters such as logprob_threshold, no_speech_threshold, compression_ratio_threshold, etc.

AgatheBauer commented 1 month ago

Thanks at all for clarifying this!