argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
http://argmaxinc.com/blog/whisperkit
MIT License
3.92k stars 330 forks source link

How do I use a parameter like initial_prompt in Python's Whisper? #127

Open xiangliangX opened 6 months ago

xiangliangX commented 6 months ago

How do I use a parameter like initial_prompt in Python's Whisper?

ZachNagengast commented 6 months ago

Best way to do this is to use the promptTokens parameter in DecodingOptions. This will add whatever tokens you pass into the <|startofprev|> section of the prompt that is passed into the decoder and can help with spelling and punctuation style, but keep in mind this is not like a LLM prompt, it should purely be used as an example of the style and spelling of output you're looking for. Check out these test cases for how you might be able to implement it in your code: https://github.com/argmaxinc/WhisperKit/blob/3bab206f2a308583b5b7692a25b05aac5423ab10/Tests/WhisperKitTests/UnitTests.swift#L730-L759

foeken commented 1 month ago

Is there an example how best to write a prompt to improve spelling of a set of a dictionary words?

Would you simply put those in the prompt one by one or is there more effective way?

This was my attempt, but its hard to see if it is working

if !spellingDictionary.isEmpty, let tokenizer = whisperKit?.tokenizer {
  let promptText = " " + spellingDictionary.joined(separator: " ").trimmingCharacters(in: .whitespaces)
  options.promptTokens = tokenizer.encode(text: promptText).filter { $0 < tokenizer.specialTokens.specialTokenBegin } 
  options.usePrefillPrompt = true
}
ZachNagengast commented 2 weeks ago

Yes this would be the typical way to do it, however sometimes its still not enough to help for some of the smaller models.