Open dfengpo opened 6 months ago
Additionally, avoid largev3. If the language you are using works well with a smaller model, try it.
@bradmurray-dt can you please elaborate on why to avoid largev3 in context of avoiding hallucinations?
@bradmurray-dt can you please elaborate on why to avoid largev3 in context of avoiding hallucinations?
While I have not tested v3 myself, several people reported hallucinations with it. Here's an article by Deepgram describing the problem.
@bradmurray-dt can you please elaborate on why to avoid largev3 in context of avoiding hallucinations?
I have ran quite a few tests and noticed significantly higher hallucinations with large v3 than other models. Even outside of this, with dirty audio, I find higher hallucinations with medium than small, and higher with large than with medium. Others (including deepgram) have come to similar conclusions. We pre-process audio with a combination of a VAD and a classifier to filter out most non-speech audio. This has had a large improvement in both hallucination, and reducing random missing pieces of transcripts.
Disabling timestamps helps a lot in my experience (#1724). You can also cut the silence at the end before starting the transcription, or use some form of VAD if you're streaming audio.