MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
BSD 2-Clause "Simplified" License
3.53k stars 301 forks source link

Can I force transcription into english? as i produces transcriptions in "cy" but in reality it is "en" #81

Closed R-Shyam-sundar closed 5 months ago

MahmoudAshraf97 commented 1 year ago

An easy solution would be using an english only model, if you want to use large model you'll have to hardcode it

On Sun, Aug 20, 2023, 5:29 PM R-Shyam-sundar @.***> wrote:

— Reply to this email directly, view it on GitHub https://github.com/MahmoudAshraf97/whisper-diarization/issues/81, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHXHGLH4AIEVSYKER2FY7HDXWINK3ANCNFSM6AAAAAA3XKIIUA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

R-Shyam-sundar commented 1 year ago

An easy solution would be using an english only model, if you want to use large model you'll have to hardcode it

On Sun, Aug 20, 2023, 5:29 PM R-Shyam-sundar @.***> wrote:

— Reply to this email directly, view it on GitHub https://github.com/MahmoudAshraf97/whisper-diarization/issues/81, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHXHGLH4AIEVSYKER2FY7HDXWINK3ANCNFSM6AAAAAA3XKIIUA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Hi, can you guide me how to hardcode it? I'm actually new to Whisper and have no knowledge on how to customize it. Thanks

MahmoudAshraf97 commented 1 year ago

In the line that instantiates the whisper model, you can pass the language as an argument, check FasterWhisper repo for more details

AlbinGyllander commented 1 year ago

Specifying the language, as mentioned earlier, will probably help but if you need to use the auto-detect feature there is something else to check. Make sure that the first 30 seconds of the audio is actually valid speech. I had the exact same problem and realized that the beginning of the audio had very bad quality or it was silent. Whisper seems to assume that is is CY when it cannot detect speech accurately. This is likely not an issue specifically with Whisper-diarization but Whisper in general.