argmaxinc / WhisperKit

On-device Inference of Whisper Speech Recognition Models for Apple Silicon
MIT License
2.88k stars 239 forks source link

Usage of `--prompt` drastically affects results #167

Closed iandundas closed 1 week ago

iandundas commented 2 weeks ago

Hi guys,

I just want to provide some input into the (drastic) effect that providing a prompt can have on the output quality.

(Note: I'm using commit 8fcfadbe due to #163, which is impacting all transcriptions done on main.)

Example outputs for test file:

1: No prompt - great transcription!

swift run whisperkit-cli transcribe --model-path "Models/whisperkit-coreml/openai_whisper-tiny.en/" --audio-path /Users/ian/AppsDev/GoodSnooze/MacWhisper/MacWhisper/main/Sample\ Audio\ Files/File\ type\ samples/m4a/atp\ 7\ min\ clip.m4a --language "en"

Building for debugging... [114/114] Applying whisperkit-cli Build complete! (10.04s) Do you have, but also I was just finishing listening to the Hot Pockets episode. Do you have Hot Pockets in the Stowger? Me? I did. I'm ready to be right. I can't say it's in the Stowger with Casey. If it was a role I know he had Hot Pockets earlier today. Yeah. No. It was an occasional special treat when I actually had a job that I would take either what I would call a frozen meal, which you would call a TV dinner. Typically like healthy choice with lean cuisine, one of them. But I would take one of those or occasionally, a hot pocket although in the last few years of employment I mostly had gotten away from that. I am not above hot pockets for the record. I am that is totally in my wheel. Because it's not a childhood thing that you can remember. I think I actually was, I was familiar with hot pockets as a kid but I think it was more during my adult life that way. I realized listen to the podcast and looking at the art that I was mistaken in what I did know what a hot pocket was because what I had in mind. What have you ever had one? Well looking at them I'm like, oh, I remember those from when I was a kid, but we never really had them, but then I realized no, what I'm thinking of are the ones that are like the size of the little bit bigger than a postage stamp. What are those called? Toostino's like pizza rolls? Yeah, they're, they don't make sense of their call rolls, so it looks like little raviolis of stuff that burns your mouth when you bite into it. Yeah, they're always, it's always hot as lava inside of those things and they never cool off. Yeah, so, but they're tiny, but they're very small. So that's what I was picturing for hot pockets, but then I saw the picture on the box and like, wait, no, these are not the size of postage stamps. Right, these are hot and these are pockets, but they're not hot fog. Yeah. I'm not totally done with the episode, but they did nothing in that episode made me think I wanted to ever have them so oh god I was so disappointed because I liked them a lot when I was a teenager. I was so disappointed to Learn how crappy they are now like from adult. Well, you're excited about so far. They're right. I'm less than you're excited about the pepperoni one The pepperoni one I think was was the best no spoiler but it but man It's bad like they're all so bad. I don't know what happened and were they always this bad probably I mean or you know where my standards just you know oh My god, I'll call me a little economy. Yeah, they're and like and you'd like as an adult who has not eaten a hot pocket in a long time. Oh You feel bad after eating oh my god They not make lean pockets anymore that used to be a thing. I don't know if they still do that's what I would usually have if anything What would you and all this lean for like well because it made me feel slightly better about myself? Yeah, like lean cuisine and healthy choice are neither of those like no it is not it's neither lean nor healthy It is a choice. I guess I mean it's the same it's the same junkie food, but there's just there's just less of it and lower calorie So what it basically boils down to is you're trading? You're taking in fewer calories and replacing it with all of the sodium on the planet and all of it and smaller portions like this the other secret to getting getting the calories down to Yeah, it's like after you eat your lean cuisine, you're gonna then be so hungry. You're gonna eat an entire bag of Oreos after you know, it's not really helping you in the in the watching me. Don't don't challenge me here because I will I will make you guys do a lean cuisine slash healthy choice slash lean pocket challenge. Oh, I can't we can't try any more food challenges like that. I just I needed another year to recover. Oh my god. I have nightmares and this I still have like the one that I bought that was like my backup one because if I couldn't find the other one that's still in the freezer. I'll eat it. Feed it out. It's like a small male to Casey. Whatever condition is it when it gets there? You got to eat it. No problem. Wait till August. Yeah. I still haven't gotten up to the point of the episode where you tried the one that says that you don't have to cook it like the room temperature one that you could like put in your lunch. It's like, yeah, like, Dolly, meter, whatever I can't wait to see how that is. It's got to be violent. You'll say that I will say those slightly surprised me, but I won't tell you in which direction. Yeah. It's like the lunchables. Oh, God. Oh, Lunchables are my favorite. Oh, can we do Lunchables? Let's do Lunchables for member special. Please Daddy please we do handy snacks with those little red sticks and the little flat cheese. I don't know. Oh, yes. Yes. Yes. Those were good too. And in Dunka, we're in the next part of those. The red sticks were the best. Eat this stick. friend of mine brad uh... how to birthday i think it's his best right a and uh... he got from a uh... uh... neff you of his who works at the candy shop he got some uh... fundips all the other some those are my frickin jam gosh i love those so good just pure sugar up and down that was like the best part of playing t-balls a kid is that after we got to go to the concession stand to get a whole bunch of hot dogs and fund it you know see in the realm of uh... candies that are in not even trying to disguise the fact that they're just sugar, you see you got pixie sticks. You got fun, Deb. And I feel like the best one that figured out the best way to basically disguise pure sugar is nerds, right? 'Cause it's pure sugar, but all they did was like, it's made these little crumbly, it's like, well they do with breakfast area, like it's all the same material, but it's like what shape is it? Or kind of like pasta anyway. You know, pixie sticks is just like, we're not doing anything, it's just here it is, it's in the tube, good luck. (laughs) Fun dip is like, well there's some powder, but also a stick which is also made of sugar, right? And the nerds is like we found a way to color them brightly and make them into these little granules. I think they would spray some flavor crap on them too, right? - I don't, nerds are good, man. - No, no, no, no. First of all, nerds are good. Secondly, I started recording user using a former sponsor channels. I started recording shoot, I'm gonna have to look up the name of it, but it was, it's this show about like the history of food, particularly in America, and it's usually like my kind of garbage food. hosted by the guy from double dair no uh... what is the name mark summers who but yeah i think it's like a real i've heard he's a jerk i forget where i read that just that i was he day is that what it is yeah we'll go with that uh... there is a little he's been on talk show talking about okay the food that built america on the history channel and uh... it's like a history channel title yep and one of the episodes they talked about you know uh... different foods include or different candies including nerds and if memory serves, I might have the details wrong here, but if memory serves, somebody realized that in making, I can't remember what other candy they were. It was the waste product for the waste product. Yep, exactly right. And they were like, oh, damn, we can turn this into its own candy. This is perfect. I think they spray it with color and some kind of like the usual round of artificial colorings that they put on stuff like cherry, gray, whatever. And that's a brilliant use of just what is essentially just sugar. I mean, that's most candy though. - I know, but like it's, that's the barely discussed shooter in the family barely discussed. Someone also put out Smarties, which is compressed. - Sure. (laughs) - I mean, the same thing as the Smarties versus the Fundup stick. - Well, you know, spree same thing. Like there's, I don't know what spree is. - Spree is like a better smartie that has like a candy coating like a, like the asset of a nerd. It basically like a nerd that's big enough to like, to be basically not even, it's like twice the diameter of a smartie. - I see what's free, you know. I'm breaking down. - Sweet Tarts are my jam, and the smarties-- - Spring has more of like a coating, like the sort of shiny, lackery color thing, yeah. - Yeah, exactly. See, for me, I like a smartie, just find, Spring was like by my third tier. For me, it was Sweet Tarts all day, any day, Smarties second level after that. And if I needed to go for a spree, final go for a stupid spree, but they were not my favorite. Sweet Tarts are-- - I know, look up, but sweet Tarts are, let me see. - I think they're similar to Spring, right? - Oh no, sweethe's hard. Sweethe's hard to like, Thomas for kids. - Yeah, it's like an adult smarties. If you know-- - Yeah, Spring is a sweetheart with a candy-coding. - Yeah, yeah. - How many sweethe's hearts over Spring, it really says the candy-coding? - Yeah, Spring's better. - No, because then you can just scrape the sweetheart against the inside of your teeth for like a hour. - Which is your stupid health. - Which is what then does recommend I hear. - Yep, exactly. - Rubsugar, right, directly against your teeth. - So, you know, Easter just happened.

2: Transcription-relevant prompt - much shorter transcription results

swift run whisperkit-cli transcribe --model-path "Models/whisperkit-coreml/openai_whisper-tiny.en/" --audio-path /Users/ian/AppsDev/GoodSnooze/MacWhisper/MacWhisper/main/Sample\ Audio\ Files/File\ type\ samples/m4a/atp\ 7\ min\ clip.m4a --language "en" --prompt "I love Pop Tarts"

Building for debugging... [1/1] Write swift-version-33747A42983211AE.txt Build complete! (0.24s) Do you have but also I was just finishing listening to the hot pockets episode do you have hot pockets nostalgia me I did I did I'm ready to write it. Correct Casey. I can't say it's nostalgia with Casey if it was a role I know he had hot pocket earlier today. Yeah, no it was an occasional special treat when I actually had a job that I would take either what I would call a frozen meal which you would call TV dinner typically like healthy choice with lean cuisine one of them but I would take one of those or occasionally It's neither lean nor healthy. It is a choice. I guess I mean it's the same it's the same junkie food, but there's just there's just less of it and lower calorie. So what it basically boils down to is you're trading your taking in fewer calories and replacing it with all of the sodium on the planet and smaller portions like this the other secret to getting getting the calories down Yeah, it's like after you eat your link was in you're gonna then be so hungry you're gonna eat an entire bag of Oreos after you know it's not really helping you in the in the watching me It's like an adult smarties Yeah, it's like an adult smarties Yeah, spreze a sweetheart with a candy cotak Yes, how do you like sweethearts over spreze? It really says the candy coding. Yes, spreze better No, because then you can just scrape the sweet tart against the inside of your teeth for like a hour Which is what dentists recommend I hear yep exactly rub sugar right directly so so you know Easter have just happened in a

3: Empty prompt: even shorter transcription

swift run whisperkit-cli transcribe --model-path "Models/whisperkit-coreml/openai_whisper-tiny.en/" --audio-path /Users/ian/AppsDev/GoodSnooze/MacWhisper/MacWhisper/main/Sample\ Audio\ Files/File\ type\ samples/m4a/atp\ 7\ min\ clip.m4a --language "en" --prompt ""

Building for debugging... [1/1] Write swift-version-33747A42983211AE.txt Build complete! (0.10s) Do you have, but also I was just finishing listening to the hot pockets episode. Do you have hot pockets nostalgia? Me? I did. I'm ready to read it. Okay. See, I can't say it's nostalgia with Casey. If it was a role I know he had hot pocket earlier today. Yeah. No, it was an occasional special treat when I actually had a job that I would take either what I would call a frozen meal, which you would call a TV dinner. Typically like healthy choice with lean cuisine, one of them. But I would take one of those or occasionally It's neither lean nor healthy. It is a choice, I guess. I mean, it's the same it's the same junkie food, but there's just there's just less of it and lower calorie. So what it basically boils down to is you're trading. You're taking in fewer calories and replacing it with all of the sodium on the planet. It's all of it. And smaller portions like this, the other secret to getting getting the calories down. Yeah, it's like after you eat your link was in you're going to then be so hungry. You're going to eat an entire bag of Oreos after you know, it's not really helping you in the in the watching.

4: "Hello" given as prompt makes the transcription start looping (Bye! Bye! Thank you! Bye! Bye! Bye! Bye! Bye! Bye)

swift run whisperkit-cli transcribe --model-path "Models/whisperkit-coreml/openai_whisper-tiny.en/" --audio-path /Users/ian/AppsDev/GoodSnooze/MacWhisper/MacWhisper/main/Sample\ Audio\ Files/File\ type\ samples/m4a/atp\ 7\ min\ clip.m4a --language "en" --prompt "Hello"

Building for debugging... [1/1] Write swift-version-33747A42983211AE.txt Build complete! (0.10s) Do you have but also I was just finishing listening to the hot pockets episode. Do you have hot pockets nostalgia me I did I'm ready to write it. Correct Casey. I can't say it's nostalgia with Casey if you're all I know you had hot pocket earlier today. Yeah, no it was an occasional special treat when I actually had a job that I would take either what I would call a frozen meal, which you would call TV dinner typically like healthy choice with lean cuisine one of them, but I would take one of those or occasionally Hi, I'm Hi I am Hi, I'm It's neither lean nor healthy. It is a choice. I guess I mean it's the same it's the same junkie food, but there's just there's just less of it and lower calorie. So what it basically boils down to is you're trading your taking in fewer calories and replacing it with all of the sodium on the planet and smaller portions like this the other secret to getting getting the calories down. Yeah, it's like after you eat your link was in you're gonna then be so hungry you're gonna eat an entire bag of Oreos after you know it's not really helping you in the in the watching me Don't challenge me here because I will I will make you guys do a lean cuisine slash healthy choice slash lean pocket challenge. Oh, I can't we can't try any more food challenges like that I just I needed another year to recover Oh my god, I still have nightmares and this I still have like the one that I bought that was like my backup one because I couldn't find the other one that's still in the freezer. I'll eat it Mool mail to Casey right whatever condition is in when it gets there you got to eat no promise wait till August. Yeah, I still haven't gotten up to the point of the You'll say that I will say those slightly surprised me but I won't tell you in which direction. Yeah. It's like the Lunchables. Oh God. Oh Lunchables are my favorite. Oh can we do Lunchables? Let's do Lunchables remember special please Daddy please. Can we do Handy Snacks with those little red sticks and the little flat cheese. I don't know. Oh Oh, yes, yes, yes, yes, those were good too. And in Dunkle Road, thanks for the best part of those the red sticks were the best to eat this stick. I'm friend of mine Brad had a birthday. I think this is best Friday and he got from a nephew of his who works at a candy shop. He got some fun dips. Oh, yeah, those are my frickin jam gosh. I love those. Oh, so good. Just pure sugar up and down. That was like the best part of playing tea balls, okay? Is it afterwards? We've got to go to the Hi Hello Hello Hello I'm It was a good one. I am so happy to see you on the show. I am so happy to see you on the show. I am so happy to see you on the show. Thank you. I love you. Thank God! Thanks for coming. I really appreciate it and you all are welcome and welcome. Bye for now, thanks for joining us. We'll see you guys soon! I will be happy to see you guys in another video. Thank you! Bye for now! Bye! Thank you! Thanks for joining us. Bye. Bye! Bye! Thank you! Bye! Bye! Bye! Bye! Bye! Bye. Thanks for watching! Bye-bye! Bye-bye! Bye, bye-bye! Bye-bye! Bye-bye! Bye-bye! Bye-bye! Bye bye! Bye-bye! Bye bye! Bye Bye-bye, bye! Bye! Bye-bye! Bye-bye! Bye-bye! Bye-bye, bye-bye, bye-bye! Bye-bye, bye-bye, bye-bye It's like an adult smarties. Yeah, it's like an adult smarties. Yeah, it's free as a sweetheart with a candy coda. Yeah, how do you like sweethearts over spades? It really says the candy coding. Yes, breathe better. No, because then you can just scrape the sweetheart against the inside of your teeth for like a hour. And which is what that does recommend I hear. Yep, exactly. Rubs sugar right directly. So, you know, Easter just happened in a
ZachNagengast commented 2 weeks ago

Thanks for this detailed report @iandundas! This is very interesting, the prompt essentially tells the model that these words have been said just prior to the audio window, so I can see how some of these would affect the output. I suppose the question here would be are we doing something inconsistent with the openai implementation, or is this just an artifact of the fact that prompting is pretty difficult with Whisper models. I tested out this guide and was able to get matching results with Large-v2: swift run whisperkit-cli transcribe --model-path "Models/whisperkit-coreml/openai_whisper-large-v2/" --audio-path ~/Downloads/product_names.wav --prompt "QuirkQuid Quill Inc, P3-Quattro, O3-Omni, B3-BondX, E3-Equity, W3-WrapZ, O2-Outlier, U3-UniFund, M3-Mover"

With prompt:

Welcome to QuirkQuid Quill Inc, where finance meets innovation. Explore diverse offerings, from the P3-Quattro, a unique investment portfolio quadrant, to the O3-Omni, a platform for intricate derivative trading strategies. Delve into unconventional bond markets with our B3-BondX and experience non-standard equity trading with E3-Equity. Personalized investment, and a wealth of knowledge. We're here to help you find the right investment. We're here to help you find the right investment. We're here to help you find the right investment. We're here to help you find the right investment. We're here to help you find the right investment. We're here to help you find Analyze your wealth management with W3-WrapZ and anticipate market trends with the O2-Outlier, our forward-thinking financial forecasting tool. Explore venture capital world with U3-UniFund or move your money with the M3-Mover, our sophisticated monetary transfer module. At QuirkQuid Quill Inc, we turn complex finance into creative solutions. Join us in redefining financial services.

Without prompt:

Welcome to Quirk Quid Quill Inc., where finance meets innovation. Explore diverse offerings from the P3 Quatro, a unique investment portfolio quadrant to the O3 Omni, a platform for intricate derivative trading strategies. Delve into unconventional bond markets with our B3 Bond X, and experience non-standard equity trading with e3equity. Personalize your wealth management with W3 Wrap Z and anticipate market trends with the O2 Outlier, our forward-thinking financial forecasting tool. Explore venture capital world with U3 Unifund or move your money with the M3 Mover, our sophisticated monetary transfer module. At Quirk Quid Quill Inc., we turn complex finance into creative solutions. Join us in redefining financial services.
atiorh commented 2 weeks ago

Agreed with @ZachNagengast. The prompt capability of Whisper (especially non-large variants) is not well established. Our golden truth is OpenAI consistency. However, the empty prompt changing the outcome as well as #162 are definitely unexpected and we are looking into this.

ZachNagengast commented 1 week ago

Closing this for now, please reopen if you notice any regressions from the reference repo.

ZachNagengast commented 1 week ago

Note that we will be fixing the empty prompt (in progress)