argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
https://takeargmax.com/blog/whisperkit
MIT License
3.17k stars 268 forks source link

"Eager" streaming mode via word timestamps #95

Closed ZachNagengast closed 6 months ago

ZachNagengast commented 6 months ago

Implements #59

This is an experimental implementation of streaming token timestamps. The logic is similar to the flows outlined in these papers. Future work will be needed to integrate this into the library, but this PR makes the example code available via CLI and WhisperAX projects.

Over the course of building this feature, there were some pending tasks implemented as well:

Example:

https://github.com/argmaxinc/WhisperKit/assets/1981179/0327da03-9b74-4714-9239-fad567daec54