argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
http://argmaxinc.com/blog/whisperkit
MIT License
3.92k stars 330 forks source link

Use GPU for audio encoder on macOS 13 #83

Closed ZachNagengast closed 8 months ago

ZachNagengast commented 8 months ago

Some of the audio encoder models have unexpected outputs using the neural engine on macOS 13, this will set the default to GPU. It can still be overridden by passing in your own ModelComputeOptions object into the computeOptions property of the WhisperKit() init method.

Example with jfk.wav, with fix:

And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

Without fix:

<|startoftranscript|><|en|><|transcribe|><|translate|><|translate|><|si|>.<|si|><|si|>,<|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|si|><|endoftext|>