SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2
MIT License
10k stars 841 forks source link

The VAD parameters and default values in the source code is inconsistent with the description in README.md #859

Open wikios opened 1 month ago

wikios commented 1 month ago

Hello, I found that there is a description about VAD filter usage in README.md that may be inconsistent with the source code. I think "removes silence longer than 2 seconds" should probably use the argument min_speech_duration_ms rather than min_silence_duration_ms according to the source code.

In README.md :

VAD filter

... The default behavior is conservative and only removes silence longer than 2 seconds. See the available VAD parameters and default values in the source code. They can be customized with the dictionary argument vad_parameters:

segments, _ = model.transcribe(
   "audio.mp3",
   vad_filter=True,
   vad_parameters=dict(min_silence_duration_ms=500),
)

I referred to the "Attributes" in vad.py :

https://github.com/SYSTRAN/faster-whisper/blob/master/faster_whisper/vad.py , Line 21, 26

min_speech_duration_ms: Final speech chunks shorter min_speech_duration_ms are thrown out. min_silence_duration_ms: In the end of each speech chunk wait for min_silence_duration_ms before separating it

and utils_vad.py in snakers4/silero-vad :

https://github.com/snakers4/silero-vad/blob/master/utils_vad.py , Line 205, 213

min_speech_duration_ms: int (default - 250 milliseconds) Final speech chunks shorter min_speech_duration_ms are thrown out min_silence_duration_ms: int (default - 100 milliseconds) In the end of each speech chunk wait for min_silence_duration_ms before separating it

Please let me know if this understanding is correct, looking forward to a reply, thanks~

Purfview commented 1 month ago

Please let me know if this understanding is correct

Your understanding is incorrect. There is nothing wrong with VAD parameters nor description.