It would be great to start collecting reproducible performance benchmarks for supported hardware (e.g. A14+ and M1+). This should be a self-contained function that uses openai/whisper-base by default and optionally other versions that the benchmark submitter selects. Benchmarks should run on a standard set of audio files and reports should be in a digestible and shareable format:
Psuedo-code may look like this:
Detect current hardware and load the models that the user has chosen to benchmark (single, multiple, or all available models)
Download standard audio files from Hugging (jfk.wav for short-form, ted_60.wav and a sample clip from earnings22 for long-form transcriptions)
Generate the transcriptions over several iterations and runtime tabulate statistics.
Runs in streaming and file-based "offline" mode - this will require streaming emulation
Completes short-form bench and presents results before moving to long-form bench which can potentially take several minutes to complete
Will want to track: time to first token, RTF, inference timings (for encoder and decoder), total pipeline timings (model load -> transcription result)
Export these into a markdown table with relevant device info, and current commit hash, which can be posted to GitHub for public tracking
It would be great to start collecting reproducible performance benchmarks for supported hardware (e.g. A14+ and M1+). This should be a self-contained function that uses
openai/whisper-base
by default and optionally other versions that the benchmark submitter selects. Benchmarks should run on a standard set of audio files and reports should be in a digestible and shareable format:Psuedo-code may look like this:
jfk.wav
for short-form,ted_60.wav
and a sample clip fromearnings22
for long-form transcriptions)References
Open ASR leaderboard benchmarks: https://github.com/huggingface/open_asr_leaderboard Nice script for collecting environment info: https://github.com/pytorch/pytorch/blob/main/torch/utils/collect_env.py
Related Issue
https://github.com/argmaxinc/WhisperKit/issues/5