I am trying to implement real time streaming transcription and I'm experiencing an issue when using the vad_filter=True option for streaming transcription of audio chunks. Specifically, when a chunk of audio is entirely silent (no speech detected), the VAD filter removes all audio, leading to a ValueError exception during the language detection step.
INFO:faster_whisper:Processing audio with duration 00:02.048
INFO:faster_whisper:VAD filter removed 00:02.048 of audio
language = max(
ValueError: max() arg is an empty sequence
I have been using a try-except block to catch the ValueError and handle the silent parts.
try:
segments, _ = model.transcribe(combined_audio, vad_filter=True)
new_text = "".join([segment.text for segment in segments]).strip()
except ValueError:
Is there a more effective way to handle silent segments when using vad_filter=True to prevent the ValueError from occurring?
I am trying to implement real time streaming transcription and I'm experiencing an issue when using the vad_filter=True option for streaming transcription of audio chunks. Specifically, when a chunk of audio is entirely silent (no speech detected), the VAD filter removes all audio, leading to a ValueError exception during the language detection step.
I have been using a try-except block to catch the ValueError and handle the silent parts.
Is there a more effective way to handle silent segments when using vad_filter=True to prevent the ValueError from occurring?