Which Whisper model is used?

briansunter / logseq-plugin-gpt3-openai

A plugin for GPT-3 AI assisted note taking in Logseq

https://twitter.com/bsunter

MIT License

700 stars 65 forks source link

Which Whisper model is used? #102

Closed nhan000 closed 1 year ago

nhan000 commented 1 year ago

Hi I have a question.

Which Whisper model is used to transcribe the audio? It's really fast but the result is terrible for regular audio recordings (lecture recordings for example).

Could you add an option for people to choose between different models? Thanks a lot!

edshamis commented 1 year ago

For whisper there's just one model exposed. The transcription quality depends on the audio record quality

nhan000 commented 1 year ago

Per OpenAI documentation you can use the large model Models - OpenAI API

Edit: I checked my OpenAI usage and apparently it actually uses the large model. No idea why the quality of the transcript is that bad compared to the results from running it locally.

briansunter commented 1 year ago

Hey @nhan000 it should be the whisper v2 large which they call “whisper-1” in their AI. This should be the largest/best model and it’s the only one they have available via their api. I'd be curious on the differences between output between locally and in logseq when running it on the same files.