It would be nice to start measuring the word error rate (WER) of whisper.cpp across some representative dataset:
short audio
long audio
english
non-english
etc.
This will help us catch regressions in the future. I'm not familiar with what is typically used for TTS WER benchmarks, so looking for help from the community.
It would be nice to start measuring the word error rate (WER) of
whisper.cpp
across some representative dataset:This will help us catch regressions in the future. I'm not familiar with what is typically used for TTS WER benchmarks, so looking for help from the community.