Vanilla large-v3-turbo gives different ASR result in comparison with ggml-large-v3-turbo.bin

ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++

MIT License

35.38k stars 3.61k forks source link

Vanilla large-v3-turbo gives different ASR result in comparison with ggml-large-v3-turbo.bin #2479

Open sashker opened 2 weeks ago

sashker commented 2 weeks ago

I am testing the same audio file with the vanilla pipeline mentioned here whisper-large-v3-turbo and with the ggml model in whisper.cpp, and they give me very different results.

The original whisper-large-v3-turbo provides much more accurate results.

Is it because of the chunked mode VS sequential mode mentioned here?

Is there any way to change it in whisper.cpp?