Vaibhavs10 / insanely-fast-whisper

Apache License 2.0
6.94k stars 505 forks source link

[Benchmarking] Thorough benchmarking for Transformers! #96

Open Vaibhavs10 opened 7 months ago

Vaibhavs10 commented 7 months ago

I am starting this issue to do a more thorough benchmarking than the notebooks used in the repo.

What should we measure:

  1. Time for generation
  2. Max GPU VRAM
  3. Accuracy

Hardware (this would give the best of both worlds IMO):

  1. Consumer (T4)
  2. A100s

Tricks that we should measure:

  1. scaled_dot_product_attention via BetterTransformers API in Optimum.
  2. Flash Attention 2
  3. Chunked batching via the pipeline API in Transformers
  4. Speculative Decoding

Models that we should test:

  1. openai/whisper-large-v3
  2. distil-whisper/distil-large-v2
BBC-Esq commented 6 months ago

Has this been finalized yet just out of curiosity?