argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
http://argmaxinc.com/blog/whisperkit
MIT License
3.92k stars 330 forks source link

VAD issue with English-only models #154

Closed iandundas closed 5 months ago

iandundas commented 5 months ago

v0.7.1

Without VAD With VAD
CleanShot 2024-05-29 at 12 11 02@2x image

File used: http://172.104.253.215/ian2-mono.wav

Platform: M1 Macbook Pro, 32 GB, Sonoma 14.5

As discussed, might be related to English-only models. However, there's also this https://github.com/argmaxinc/WhisperKit/issues/150, so not sure.

ZachNagengast commented 5 months ago

Thanks for the report @iandundas, are you building from source or running the testflight app? We are updating the testflight app shortly with this fix https://github.com/argmaxinc/WhisperKit/compare/v0.7.0...v0.7.1

iandundas commented 5 months ago

This is v0.7.1 from source

atiorh commented 5 months ago

@iandundas I believe @ZachNagengast fixed all of #154, #152 and #151 with #155 today. Could you please verify the fix and close the issues that are fixed for you? 🙏

atiorh commented 5 months ago

Note that the latest WhisperAX on TestFlight (0.3.1) includes this fix.

iandundas commented 5 months ago

Seems to be fixed now in v0.7.2! Thanks!