argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
https://takeargmax.com/blog/whisperkit
MIT License
3.17k stars 267 forks source link

How to use custom prompts? Couldn't find the usage from the examples. #110

Closed YiLee01 closed 5 months ago

YiLee01 commented 5 months ago

I made changes in the prefillDecoderInputs method for debugging, but found that it's not working. Here is a snippet of my code, is there something wrong with it?

let promptTokens = tokenizer.encode(text: "以下是普通话的句子,请以简体中文输出")

ZachNagengast commented 5 months ago

Prompting is a little tricky with whisper, much different from LLMs, because the prompt is just what the model is supposed to assume is the transcription from the previous window. Here is a more in depth guide: https://cookbook.openai.com/examples/whisper_prompting_guide

To put it simply, you're usually best off giving a really good example of text that you want it to output as the response, and the model will try to follow that format. See more here in these unit tests: https://github.com/argmaxinc/WhisperKit/blob/5572cd63c763c82c973077659c34a20e90d2afed/Tests/WhisperKitTests/UnitTests.swift#L681-L710

YiLee01 commented 5 months ago

Thank you for your help, you solved my problem!