Closed iandundas closed 1 week ago
Thanks for this detailed report @iandundas! This is very interesting, the prompt essentially tells the model that these words have been said just prior to the audio window, so I can see how some of these would affect the output. I suppose the question here would be are we doing something inconsistent with the openai implementation, or is this just an artifact of the fact that prompting is pretty difficult with Whisper models. I tested out this guide https://cookbook.openai.com/examples/whisper_prompting_guide and was able to get matching results with Large-v2:
swift run whisperkit-cli transcribe --model-path "Models/whisperkit-coreml/openai_whisper-large-v2/" --audio-path ~/Downloads/product_names.wav --prompt "QuirkQuid Quill Inc, P3-Quattro, O3-Omni, B3-BondX, E3-Equity, W3-WrapZ, O2-Outlier, U3-UniFund, M3-Mover"
With prompt:
Welcome to QuirkQuid Quill Inc, where finance meets innovation. Explore diverse offerings, from the P3-Quattro, a unique investment portfolio quadrant, to the O3-Omni, a platform for intricate derivative trading strategies. Delve into unconventional bond markets with our B3-BondX and experience non-standard equity trading with E3-Equity. Personalized investment, and a wealth of knowledge. We're here to help you find the right investment. We're here to help you find the right investment. We're here to help you find the right investment. We're here to help you find the right investment. We're here to help you find the right investment. We're here to help you find Analyze your wealth management with W3-WrapZ and anticipate market trends with the O2-Outlier, our forward-thinking financial forecasting tool. Explore venture capital world with U3-UniFund or move your money with the M3-Mover, our sophisticated monetary transfer module. At QuirkQuid Quill Inc, we turn complex finance into creative solutions. Join us in redefining financial services.
Without prompt:
Welcome to Quirk Quid Quill Inc., where finance meets innovation. Explore diverse offerings from the P3 Quatro, a unique investment portfolio quadrant to the O3 Omni, a platform for intricate derivative trading strategies. Delve into unconventional bond markets with our B3 Bond X, and experience non-standard equity trading with e3equity. Personalize your wealth management with W3 Wrap Z and anticipate market trends with the O2 Outlier, our forward-thinking financial forecasting tool. Explore venture capital world with U3 Unifund or move your money with the M3 Mover, our sophisticated monetary transfer module. At Quirk Quid Quill Inc., we turn complex finance into creative solutions. Join us in redefining financial services.
Agreed with @ZachNagengast. The prompt capability of Whisper (especially non-large variants) is not well established. Our golden truth is OpenAI consistency. However, the empty prompt changing the outcome as well as #162 are definitely unexpected and we are looking into this.
Closing this for now, please reopen if you notice any regressions from the reference repo.
Note that we will be fixing the empty prompt (in progress)
Hi guys,
I just want to provide some input into the (drastic) effect that providing a prompt can have on the output quality.
(Note: I'm using commit 8fcfadbe due to #163, which is impacting all transcriptions done on
main
.)Example outputs for test file:
1: No prompt - great transcription!
swift run whisperkit-cli transcribe --model-path "Models/whisperkit-coreml/openai_whisper-tiny.en/" --audio-path /Users/ian/AppsDev/GoodSnooze/MacWhisper/MacWhisper/main/Sample\ Audio\ Files/File\ type\ samples/m4a/atp\ 7\ min\ clip.m4a --language "en"
2: Transcription-relevant prompt - much shorter transcription results
swift run whisperkit-cli transcribe --model-path "Models/whisperkit-coreml/openai_whisper-tiny.en/" --audio-path /Users/ian/AppsDev/GoodSnooze/MacWhisper/MacWhisper/main/Sample\ Audio\ Files/File\ type\ samples/m4a/atp\ 7\ min\ clip.m4a --language "en" --prompt "I love Pop Tarts"
3: Empty prompt: even shorter transcription
swift run whisperkit-cli transcribe --model-path "Models/whisperkit-coreml/openai_whisper-tiny.en/" --audio-path /Users/ian/AppsDev/GoodSnooze/MacWhisper/MacWhisper/main/Sample\ Audio\ Files/File\ type\ samples/m4a/atp\ 7\ min\ clip.m4a --language "en" --prompt ""
4: "Hello" given as prompt makes the transcription start looping (Bye! Bye! Thank you! Bye! Bye! Bye! Bye! Bye! Bye)
swift run whisperkit-cli transcribe --model-path "Models/whisperkit-coreml/openai_whisper-tiny.en/" --audio-path /Users/ian/AppsDev/GoodSnooze/MacWhisper/MacWhisper/main/Sample\ Audio\ Files/File\ type\ samples/m4a/atp\ 7\ min\ clip.m4a --language "en" --prompt "Hello"