argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
http://argmaxinc.com/blog/whisperkit
MIT License
3.92k stars 331 forks source link

Benchmark for WhisperAX & CLI #28

Closed ZachNagengast closed 2 weeks ago

ZachNagengast commented 9 months ago

It would be great to start collecting reproducible performance benchmarks for supported hardware (e.g. A14+ and M1+). This should be a self-contained function that uses openai/whisper-base by default and optionally other versions that the benchmark submitter selects. Benchmarks should run on a standard set of audio files and reports should be in a digestible and shareable format:

Psuedo-code may look like this:

  1. Detect current hardware and load the models that the user has chosen to benchmark (single, multiple, or all available models)
  2. Download standard audio files from Hugging (jfk.wav for short-form, ted_60.wav and a sample clip from earnings22 for long-form transcriptions)
  3. Generate the transcriptions over several iterations and runtime tabulate statistics.
    • Runs in streaming and file-based "offline" mode - this will require streaming emulation
    • Completes short-form bench and presents results before moving to long-form bench which can potentially take several minutes to complete
    • Will want to track: time to first token, RTF, inference timings (for encoder and decoder), total pipeline timings (model load -> transcription result)
  4. Export these into a markdown table with relevant device info, and current commit hash, which can be posted to GitHub for public tracking

References

Open ASR leaderboard benchmarks: https://github.com/huggingface/open_asr_leaderboard Nice script for collecting environment info: https://github.com/pytorch/pytorch/blob/main/torch/utils/collect_env.py

Related Issue

https://github.com/argmaxinc/WhisperKit/issues/5

atiorh commented 2 weeks ago

https://huggingface.co/spaces/argmaxinc/whisperkit-benchmarks

https://x.com/argmaxinc/status/1851723587423756680