argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
http://argmaxinc.com/blog/whisperkit
MIT License
3.92k stars 331 forks source link

Beam Search #30

Open ZachNagengast opened 9 months ago

ZachNagengast commented 9 months ago

Beam search on CoreML will require some model changes to work according to the reference implementation. This is mainly due to CoreML static shapes requiring a new model for each possible beam_size. We have some plans to deal with this so will keep this issue here for tracking purposes.

References

Openai implementation: https://github.com/openai/whisper/blob/ba3f3cd54b0e5b8ce1ab3de13e32122d0d5f98ab/whisper/decoding.py#L301-L404