argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
https://takeargmax.com/blog/whisperkit
MIT License
3.09k stars 256 forks source link

Implement test data-driven `unsupportedModelDeviceCombination` at init #118

Open atiorh opened 4 months ago

atiorh commented 4 months ago

After specifying a minimum OS version of macOS13 and iOS16, there is still a large matrix of possible model-device configurations for deployment:

Devices have varying capabilities across:

Model versions that have varying requirements:

Not all model versions will successfully run on all Apple devices. Some combinations may be enabled in the future by Argmax work. Some combinations may be disabled due to OS regressions. Hence, we would like to automatically test all combinations before each release using the testing infrastructure being laid out in #99 by @Abhinay1997 and throw unsupportedModelDeviceCombination for combinations that fail to pass all the tests during WhisperKit pipeline init time.

Taking it a step further, the WhisperKit pipeline object should be able to take an optional "peakMemoryAllowed" argument at init time and compare that against the peak memory recorded during test on the particular model-device combination and throw unsupportedModelDeviceCombination.

For the time being, we will disable all Intel Macs. Please reach out if you need Intel Macs support for your product.

iandundas commented 3 months ago

That sounds great! Additionally it would be great if there was maybe a function that took an enum parameter indicating what the priority was (e.g. speed, quality, low-memory), and WhisperKit would return a recommended model / a short short list of recommended models that would be the best fit for the current system.

atiorh commented 3 months ago

Sounds great, thanks for the input @iandundas! We are going to start working on this soon :)