argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
https://takeargmax.com/blog/whisperkit
MIT License
3.17k stars 267 forks source link

VAD: Finishes too early (almost empty transcript) with VAD enabled, completes successfully without. #150

Closed iandundas closed 4 months ago

iandundas commented 4 months ago

With the example project on v0.71, with default settings:

Without VAD With VAD
CleanShot 2024-05-29 at 11 24 17@2x CleanShot 2024-05-29 at 11 24 34@2x

Platform: M1 Macbook Pro, 32 GB, Sonoma 14.5

Full video: http://172.104.253.215/CleanShot-202024-05-28-20at-2013.26.09.mp4 File used: http://172.104.253.215/atp-7-min-clip.m4a

iandundas commented 4 months ago

Seems to be fixed now in v0.7.2! Thanks!