argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
http://argmaxinc.com/blog/whisperkit
MIT License
3.92k stars 330 forks source link

Enable word timestamps for distil-large-v3 #101

Closed atiorh closed 7 months ago

atiorh commented 7 months ago

@jongwook determined the alignment_heads for OpenAI Whisper models by manual inspection which are required for DTW-based (Accurate) word timestamps. We need to perform the same manual inspection for distil-large-v3 so word timestamps can be enabled for it. Word timestamps are required to benefit from the "Eager Mode" streaming feature: https://x.com/argmaxinc/status/1774809790595932658?s=20

ZachNagengast commented 7 months ago

Completed with https://huggingface.co/distil-whisper/distil-large-v3/discussions/3 and https://github.com/argmaxinc/whisperkittools/commit/0999a613c56c462b063b6b25d96260e1fc6ee2de