Word level timestamps slightly off, noticed in #105
Detect language was not usable easily in conjunction with prefill or prompt tokens noticed by Diirge in discord.
The word timestamps are still not using a median filter but they line up quite well without it. With these changes, the main differences are when words start, most of the endings are perfectly in line.
Here are some comparisons using the audio provided in #105 (Top is ours, bottom is from HEAD openai/whisper python repo)
WhisperKit better starting point:
OpenAI better starting point:
Will continue to refine these over time, thanks @finnvoor for finding this and providing a great example to replicate.
This addresses a couple of issues
The word timestamps are still not using a median filter but they line up quite well without it. With these changes, the main differences are when words start, most of the endings are perfectly in line.
Here are some comparisons using the audio provided in #105 (Top is ours, bottom is from HEAD openai/whisper python repo)
WhisperKit better starting point:
OpenAI better starting point:
Will continue to refine these over time, thanks @finnvoor for finding this and providing a great example to replicate.