argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
http://argmaxinc.com/blog/whisperkit
MIT License
3.92k stars 331 forks source link

Language Detection #29

Closed ZachNagengast closed 7 months ago

ZachNagengast commented 9 months ago

Language detection here should be fairly simple with logits filters now, it will entail a single decoder pass and sample just the language tokens. However, this cannot be used when we are using a prefill prompt (i.e. forced decoder tokens) so that will need special handling.

References

Openai implementation: https://github.com/openai/whisper/blob/ba3f3cd54b0e5b8ce1ab3de13e32122d0d5f98ab/whisper/decoding.py#L19 WhisperKit inline todo: https://github.com/argmaxinc/WhisperKit/blob/228630c37e4ac1b1c95790d77f64058d317f8859/Sources/WhisperKit/Core/TextDecoder.swift#L300

Abhinay1997 commented 8 months ago

I've started on this and have the LanguageLogitsFilter done. Will open a PR once the detectLanguage function is complete and add tests while its in draft

Abhinay1997 commented 7 months ago

Can we mark this as done so it reflects in the project page ?

ZachNagengast commented 7 months ago

Closing now that #78 is merged, well done!