argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
https://takeargmax.com/blog/whisperkit
MIT License
3.17k stars 268 forks source link

Timestamp Rules Logits Filter #24

Closed ZachNagengast closed 6 months ago

ZachNagengast commented 7 months ago

Timestamp rules are helpful to more consistently find reliable timestamps during decoding.

Important note: We have already brought over some of this logic into the SegmentSeeker which runs at the end of a full decode loop to generate the segments. This feature will need to detangle any repeated logic between them.

References:

Openai implementation: https://github.com/openai/whisper/blob/ba3f3cd54b0e5b8ce1ab3de13e32122d0d5f98ab/whisper/decoding.py#L441-L505

jkrukowski commented 7 months ago

I can take it