SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2
MIT License
12.45k stars 1.05k forks source link

Improve Language detection #265

Open ab-pandey opened 1 year ago

ab-pandey commented 1 year ago

Since wishper detects language based on first 30 secs of the audio, sometimes there are errors in language detection. For example This video is in english but both whisper and faster_whisper detects language as hindi "hi". There is a solution for this issue in whisper here. Since I am new to Ctranslate2, I am having difficulty in cloning this solution to faster_whisper. Can someone help me on this? Thanks. P.S. I have tested the solution in whisper and it works.

guillaumekln commented 1 year ago

Since I am new to Ctranslate2, I am having difficulty in cloning https://github.com/openai/whisper/pull/676 to faster_whisper. Can someone help me on this? Thanks

What have you tried so far? Maybe you can share your current changes and we can help you implement this.

ab-pandey commented 1 year ago

Actually I am not able to figure out how to proceed further. There are several variables in the fix like "content_frames", "N_FRAMES", "segment"; functions used "pad_or_trim" in this solution which I am not able to figure out how to get the correct values for faster_whisper implementation

elloza commented 8 months ago

has been this implemented?

dilerbatu commented 8 months ago

Any update ?

trungkienbkhn commented 8 months ago

FYI, I created a new PR to implement this feature: https://github.com/SYSTRAN/faster-whisper/pull/732

ldolegowski92 commented 6 months ago

Do Twojej wiadomości, utworzyłem nowy PR, aby wdrożyć tę funkcję: #732

How will the model behave if the conditions are not met?