argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
https://takeargmax.com/blog/whisperkit
MIT License
3.17k stars 268 forks source link

Speculative decoding support with Eager streaming mode #102

Open atiorh opened 5 months ago

atiorh commented 5 months ago

The Eager streaming mode implies that we predict the same token at least twice. This is a great opportunity to design a speculative decoding technique that can leverage a fast draft model* and amortize the redundant predictions while accelerating the overall pipeline.