SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2
MIT License
12.59k stars 1.05k forks source link

Start language detection from a certain time in Batch transcription #1071

Closed misters2008 closed 2 days ago

misters2008 commented 1 month ago

Hello. i am using Batch trancription.

some of my audios dont have any speech in the first 30 or even 60 or even 300 seconds. i want the language detection to happen in the time range 300-330 seconds of the audio.

How can i achieve this?

P.S. i tried changing the setting of language_detection_segments to= 11 with the idea that it will take 330 seconds, identify that the first 300 have no speech and so it that it will detect the language in the rest 30 seconds left, where speech is present. However, this didnt change anything and terminal kep printing in english, even though the whole audio has no english words. So i after this i ended with the idea at the start of this text.

When running, the terminal's first print starts from 300th sec: 300sec>301sec Word

Jiltseb commented 3 weeks ago
  1. pip install git+https://github.com/SYSTRAN/faster-whisper.git
  2. Initialize the WhisperModel as usual
  3. use the detect_language_multi_segment() function to directly get the major language of the audio. This automatically removes unvoiced regions before detecting the langauge. model.detect_language_multi_segment(audio)