argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
http://argmaxinc.com/blog/whisperkit
MIT License
3.92k stars 331 forks source link

Reducing halllucinations with first text token logprob thresholding #37

Closed atiorh closed 7 months ago

atiorh commented 8 months ago

It would be great if the first text token's logprob can be used to discard a transcription draft as failed and start over. Start over could mean either falling back to a higher temperature sampling or updating the audio buffer for streaming use cases.