SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2
MIT License
12.53k stars 1.05k forks source link

VAD Filter causing ValueError with silent audio segments during streaming #1127

Closed FrankyKyaw closed 6 days ago

FrankyKyaw commented 1 week ago

I am trying to implement real time streaming transcription and I'm experiencing an issue when using the vad_filter=True option for streaming transcription of audio chunks. Specifically, when a chunk of audio is entirely silent (no speech detected), the VAD filter removes all audio, leading to a ValueError exception during the language detection step.

INFO:faster_whisper:Processing audio with duration 00:02.048
INFO:faster_whisper:VAD filter removed 00:02.048 of audio

    language = max(
ValueError: max() arg is an empty sequence

I have been using a try-except block to catch the ValueError and handle the silent parts.

try: 
  segments, _ = model.transcribe(combined_audio, vad_filter=True) 
  new_text = "".join([segment.text for segment in segments]).strip() 
except ValueError:

Is there a more effective way to handle silent segments when using vad_filter=True to prevent the ValueError from occurring?

MahmoudAshraf97 commented 6 days ago

can you try again using master branch?

FrankyKyaw commented 6 days ago

Yeah, it worked. Thanks!