chidiwilliams / buzz

Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
https://chidiwilliams.github.io/buzz
MIT License
12.16k stars 913 forks source link

Support for large-v2 and large-v3 missing #801

Closed faspie closed 3 months ago

faspie commented 3 months ago

Pleas add support for the models "large-v2" an "large-v3"

raivisdejus commented 3 months ago

Faster whisper internally uses large-v2, that is a note.

To your mind is there any reason to keep "large" or does it make sense to switch to "large-v2" and "large-v3"?

In my experience "large-v3" can have more hallucinations than "large-v2".

faspie commented 3 months ago

I have never used tiny, small or large yet but I am using whisper with a huge server and a Tesla P40. On a standard configuration I would probably use large...

I am using large-v2 and large-v3. With German language large-v3 seems to be more resilient in setting with much ambient noise but has indeed more hallucinations e. g. complex sentences or breaks lead to continous repeat of a sentence. Sometimes there are "free" hallucinations which have absolutely nothing to do with the record.

I would suggest to implement support for large, large-v2 and large-v3

faspie commented 3 months ago

Perhaps, for unexperienced users, you could mark some settings as "recommended"

faspie commented 3 months ago

You are great! Thank you!