argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
http://argmaxinc.com/blog/whisperkit
MIT License
3.87k stars 329 forks source link

Getting `END PLAY` if audio starts with a bit of silence #203

Closed vojto closed 1 month ago

vojto commented 1 month ago

Tested with the demo app:

CleanShot 2024-09-20 at 08 34 18@2x

Source audio:

0919_134422_audio_samples_iteration_1.wav.zip

vojto commented 1 month ago

This seems to be caused by the silence in the beginning.

I tried downloading the original mp3 from youtube - clipped it to 10s. If I send the 10s clip to WhisperKit, it works fine.

However, if I prepend 1s of silence, it fails like described above.

atiorh commented 1 month ago

Hey @vojto, could you please report your hardware, operating system and WhisperAX version? The audio file you shared seems works for me on the latest WhisperAX:

Screenshot 2024-09-20 at 9 45 22 AM
atiorh commented 1 month ago

Closing due to inactivity, please reopen if you have new instructions for us to attempt to reproduce. Thanks!