[Benchmarking] Thorough benchmarking for Transformers! - Githubissues

Vaibhavs10 / insanely-fast-whisper

Apache License 2.0

6.94k stars 505 forks source link

[Benchmarking] Thorough benchmarking for Transformers! #96

Open Vaibhavs10 opened 7 months ago

Vaibhavs10 commented 7 months ago

I am starting this issue to do a more thorough benchmarking than the notebooks used in the repo.

What should we measure:

Time for generation
Max GPU VRAM
Accuracy

Hardware (this would give the best of both worlds IMO):

Consumer (T4)
A100s

Tricks that we should measure:

scaled_dot_product_attention via BetterTransformers API in Optimum.
Flash Attention 2
Chunked batching via the pipeline API in Transformers
Speculative Decoding

Models that we should test:

BBC-Esq commented 6 months ago

Has this been finalized yet just out of curiosity?