show benchmarks - Githubissues

laralove143 commented 9 months ago

the advantage of this project is that it uses CoreML for a performance gain, so showing benchmarks would solidify how much this advantage is

atiorh commented 9 months ago

Hi @laralove143, please give https://www.takeargmax.com/blog/whisperkit a read for the value proposition of WhisperKit. That being said, performance is definitely a big part and we are working on a "Performance Benchmark Tab" in the example app. Will follow up here shortly.

laralove143 commented 9 months ago

that blog is very useful maybe it could be shown more clearly in the readme, for example talking about its contents

alternatively, some stuff from the blog could be included in the readme as well, like that demo video is very useful

atiorh commented 9 months ago

Thanks for the feedback! We will think about a better way to organize information about WhisperKit that is more accessible. We will definitely flesh out the README and docs more before stable release.

ZachNagengast commented 8 months ago

Tracking this here: #28

aehlke commented 8 months ago

My understanding from running llama.cpp on iOS/macOS (via Swift, including streaming) is that Metal is faster than CoreML or Metal+CoreML. There may be some other benefits to using CoreML. Maybe battery? I don't know myself

atiorh commented 8 months ago

Metal is faster than CoreML or Metal+CoreML

This is certainly possible in specific cases but can not be a generally true statement. For context, WhisperKit is currently tuned for mobile and lower-end Macs where the Neural Engine is much more powerful with respect to the GPU (that Metal can harness) and Core ML is the primary framework for deploying to the Neural Engine. That being said, we are actively working on a Metal backend to complement the Core ML backend.

aehlke commented 8 months ago

Thanks for the context. Would be great to see a benchmark later. I recall the Metal-only whispercpp being faster even on lower spec devices such as iPhone but can't find the numbers at the moment.

aehlke commented 8 months ago

Here are some numbers I hadn't seen before showing far better Metal performance on an iPhone using Metal instead of CoreML. So it looks like it holds true for mobile...

https://www.bjnortier.com/2023/11/17/Hello-Transcribe-3.2.html

It mentions some other downsides to CoreML such as the slow caching step and unpredictable cache ejection by the OS

atiorh commented 4 days ago

https://huggingface.co/spaces/argmaxinc/whisperkit-benchmarks

Here is our comprehensive benchmark suite which will be updated with every release starting WhisperKit-0.9!

Contents:

Performance reporting on long-form ("from file" proxy) and short-form ("streaming" proxy) audio
Data used to benchmark is published on Hugging Face and benchmarks are reproducible by following instructions in BENCHMARKS.md
End-to-end quality (WER) reporting across 3 datasets and 77 languages
Device support tab for finding recommended models given a device

Looking forward to the community feedback!

Note: Higher performance with WhisperKit is possible. However, the dashboard data represents using the recommended (default) configuration that best balances battery life, thermal sustainability, memory consumption and latency for a smooth user experience. For example, on M2 Ultra, WhisperKit runs the latest OpenAI Large V3 Turbo model (v20240930/turbo in WhisperKit) as fast as 72x real-time with a GPU+ANE config. However, the default config (ANE only) is published as 42x real-time on the dashboard.

argmaxinc / WhisperKit

show benchmarks #5