In reality, even if a voice repeats a certain phrase multiple times, the probability (confidence) of speech recognition should not be the same.

However, for hallucinated text, sometimes the text and probability just repeats themselves. Example results: https://docs.google.com/spreadsheets/d/10xfkOZpGJ-9vIvoYziEkD1lZETWMbBLDT-NABdQ8H_g/edit#gid=0&range=32:34

By removing segments with the same text and probability, we can reduce the hallucination by around 50%.

We also correct the Whisper prompt. Whisper, unlike ChatGPT, is not instruction tuned. It is meaningless to provide commands inside its prompt. Therefore, we just include the bare minimum text to lead the transcript to use Taiwanese Mandarin (ex: 網際網路 instead of 互联网、影片 instead of 视频) and full-width punctuations.