So if the VAD does not find any audios it return an empty array and prints
the message "No active speech found in audio results". So far so good.
But next fast-whisper runs into an error which I can not catch outside of the package because in line 523 in transcribe.py you use a torch.stack() which expects a non empty TensorList.
The expected behaviour should be that fast-whisper just returns an empty string if it does not find any speech in an audio.
The problematic code section is in the vad models merge_chunks function:
if len(segments_list) == 0:
print("No active speech found in audio")
return []
So if the VAD does not find any audios it return an empty array and prints the message
"No active speech found in audio results"
. So far so good.But next fast-whisper runs into an error which I can not catch outside of the package because in line 523 in transcribe.py you use a
torch.stack()
which expects a non empty TensorList.The expected behaviour should be that fast-whisper just returns an empty string if it does not find any speech in an audio.
The problematic code section is in the vad models
merge_chunks
function: