whisper.cpp 1.20 produces different inference than OpenAI whisper and with higher WER

jordimas commented 1 year ago

Hello!

First, thanks for writing such a great tool.

Whisper.cpp: version 1.20 Open AI: version openai-whisper-20230124 Model used: medium

I will expect Whisper.cpp to produce the same output under the same model and input than OpenAI Whisper.

In terms of WER against reference the txt human transcribed file: OpenAI whisper -WER: 28.08, Whisper.cpp : WER 35.86

If there is anything that I can do to help, let me know

Thanks

ggerganov commented 1 year ago

Thanks for the data point! How do I calculate WER scores?

jordimas commented 1 year ago

Basically:

I execute the tools from the command lines (whisper.cpp, OpenAI python client)
I use HuggingFace WER metric module to calculate the difference between transcription and expected file: https://github.com/jordimas/whisper-cpp-error/blob/main/benchmark.py#L37

However, you can also see that the produced files are different.

Thanks

ggerganov / whisper.cpp