Issues with Capitalization and Punctuation in Transcribed Audios

RomanLeo2003 commented 4 months ago

After transcribing several audio files using medium model, I have noticed that the transcriptions lack capitalization and punctuation. For example:

Transcribed text with punctuation and capitalization: "Produces, for example, a Renault headlight. They say, yes, yes, we produce it."

Transcribed text without punctuation and capitalization: "produces for example a renault headlight they say yes yes we produce it"

I suspect that this issue might be due to some accumulated cache in the model (or something similar). This problem seems to occur with certain types of content, but I am not sure. BTW, sometimes the problem fixes itself after a few minutes of audio. Therefore, my questions are:

Why does this happen? How can I fix it?

I use this configuration of parameters:

vad_parameters = {
    'threshold': 0.5,
    'min_speech_duration_ms': 400,
    'max_speech_duration_s': float("inf"),
    'min_silence_duration_ms': 400,
    'window_size_samples': 1024,
    'speech_pad_ms': 750
}

hallucination_silence_threshold = 0.8
model_size = "medium"
compute_type = "float16"
beam_size = 8

matveynator commented 2 months ago

and silance ... same problems...

have you managed to understand the cause of the problem? thank you

RomanLeo2003 commented 3 weeks ago

and silance ... same problems...

have you managed to understand the cause of the problem? thank you

Hi! Yes, I've searched for similar issues in many other repositories and found out that it's just a bug in Whisper. Whisper "forgets" to do punctuation and capitalization, and we can "remind" it by using a prompt with punctuation and capitalization.

Using a prompt can introduce instability in the final result (hallucinations and other issues), so I refuse this possibility because of it. However, you may try and experiment with it!

SYSTRAN / faster-whisper

Issues with Capitalization and Punctuation in Transcribed Audios #815