Closed laralove143 closed 4 days ago
Hi @laralove143, please give https://www.takeargmax.com/blog/whisperkit a read for the value proposition of WhisperKit. That being said, performance is definitely a big part and we are working on a "Performance Benchmark Tab" in the example app. Will follow up here shortly.
that blog is very useful maybe it could be shown more clearly in the readme, for example talking about its contents
alternatively, some stuff from the blog could be included in the readme as well, like that demo video is very useful
Thanks for the feedback! We will think about a better way to organize information about WhisperKit that is more accessible. We will definitely flesh out the README and docs more before stable release.
Tracking this here: #28
My understanding from running llama.cpp on iOS/macOS (via Swift, including streaming) is that Metal is faster than CoreML or Metal+CoreML. There may be some other benefits to using CoreML. Maybe battery? I don't know myself
Metal is faster than CoreML or Metal+CoreML
This is certainly possible in specific cases but can not be a generally true statement. For context, WhisperKit is currently tuned for mobile and lower-end Macs where the Neural Engine is much more powerful with respect to the GPU (that Metal can harness) and Core ML is the primary framework for deploying to the Neural Engine. That being said, we are actively working on a Metal backend to complement the Core ML backend.
Thanks for the context. Would be great to see a benchmark later. I recall the Metal-only whispercpp being faster even on lower spec devices such as iPhone but can't find the numbers at the moment.
Here are some numbers I hadn't seen before showing far better Metal performance on an iPhone using Metal instead of CoreML. So it looks like it holds true for mobile...
https://www.bjnortier.com/2023/11/17/Hello-Transcribe-3.2.html
It mentions some other downsides to CoreML such as the slow caching step and unpredictable cache ejection by the OS
https://huggingface.co/spaces/argmaxinc/whisperkit-benchmarks
Here is our comprehensive benchmark suite which will be updated with every release starting WhisperKit-0.9!
Contents:
Looking forward to the community feedback!
Note: Higher performance with WhisperKit is possible. However, the dashboard data represents using the recommended (default) configuration that best balances battery life, thermal sustainability, memory consumption and latency for a smooth user experience. For example, on M2 Ultra, WhisperKit runs the latest OpenAI Large V3 Turbo model (v20240930/turbo
in WhisperKit) as fast as 72x real-time with a GPU+ANE config. However, the default config (ANE only) is published as 42x real-time on the dashboard.
the advantage of this project is that it uses CoreML for a performance gain, so showing benchmarks would solidify how much this advantage is